mEpiWorks is the International Working Group for Molecular Epidemiology -
an informal community to support the use of molecular tools in (veterinary) epidemiology

Rarefaction analysis

Rarefaction analysis is a simple tool to compare genetic diversity. Sample-based rarefaction (also known as the species accumulation curve) is applicable when a number of samples are available, from which for example species richness is to be estimated as a function of number of samples.

This method is very useful when comparing genotypes from e.g. different sources or region, when the sampling effort differed. For example you are trying to compare the diversity of Campylobacter genotypes in human and poultry sources. Ideally you would have 100 typed isolates from each; the comparison would then be very straight forward – the source with more different genotypes is more diverse. However in reality almost always different numbers of isolates have been typed and the data could look as follows:

Source

Number of
isolates

Number of
different genotypes

Human

103

26

Poultry

78

20

The question then is: how do you compare the diversity in the two samples?

Basically rarefaction uses the data from the larger sample to answers the question "How many species (or genotypes) would have been found in a smaller sample?"

The technique originated in ecology in the 1960’s and can be used to compare the number of species found in different regions or from different sources when the sampling effort differed. In principal it can be expected that a greater sampling effort would yield a larger sample and more species, so you can't just compare the number of species found in each region or source.

If you found n organisms in the less-sampled region, rarefaction takes hypothetical sub-samples of n organisms from the more-sampled region, and calculates the average number of species in such sub-samples.

This average can be compared to the number of species actually found in the less-sampled region. (The method computes a variance and standard deviation to help you judge how significant any difference is.)

 

Rarefaction curves can be plotted (with or without corresponding confidence intervals) and look like this:

Rarefaction

Another interesting application of this technique is to evaluate, if continuous sampling will retrieve more new genotypes or species. You can see in the above example that the curve becomes flatter with increasing sampling effort, and it therefore becomes less likely to detect new types in the additional samples.

There is freeware available for this method on the Internet (e.g. “Rarefaction calculator”). The R package “Vegan” also contains a rarefaction function.

 

For example the method is used in the following papers:

Gormley, F. J., M. MacRae, K. J. Forbes, I. D. Ogden, J. F. Dallas, and N. J. C. Strachan. 2008. Has retail chicken played a role in the decline of human campylobacteriosis? Applied and Environmental Microbiology 74:383-390.

Perron, G. G., S. Quessy, A. Letellier, and G. Bell. 2007. Genotypic diversity and antimicrobial resistance in asymptomatic Salmonella enterica serotype Typhimurium DT104. Infection Genetics and Evolution 7:223-228.

 

Sources and Resources:

http://www.biology.ualberta.ca/jbrzusto/rarefact.php

http://cc.oulu.fi/~jarioksa/softhelp/vegan.html

http://en.wikipedia.org/wiki/Rarefaction_(ecology)