mEpiWorks is the International Working Group for Molecular Epidemiology -
an informal community to support the use of molecular tools in (veterinary) epidemiology

Did you know?

What constitutes a good approach to molecular-based surveillance?

Molecular-based surveillance may be considered to consist of two main components: the molecular typing tools, and the analytical methods that model the data to identify significant disease patterns or trends over space, time and risk factors.
Strain typing is the basis for all molecular investigations and the variety of methods that are available are increasing (1, 2); for example the rapid advancement of sequencing technologies (3) means that typing tools are becoming increasingly logistically available and universally meaningful. There has been a shift away from typing methods that are comparable only within a defined context (“comparative typing”, e.g. polymerase chain reaction (PCR) or restriction fragment length polymorphisms (RFLP) - based tools) to those that generate results that are not limited to their context (“library typing’, e.g. multi locus sequence tying (MLST)) (2, 4). Molecular typing tools utilised with in a surveillance programme need to be fit-for purpose (5), and in the context of surveillance five essential characteristics can be identified, namely high quality typeability; optimal level of discrimination; epidemiological value; timeliness and cost effectiveness; and comparability.

Another aspect to consider when designing a molecular surveillance programme is the analytical methods that will be applied to the data to identify information that may be valuable for disease control, allocation of surveillance resources or further refinement of surveillance protocols. These analytical methods also need to be fit-for-purpose and the accuracy, precision and reliability of inferences are heavily dependent on using an appropriate analytical tool for the typing data generated. For example analytical approaches can vary significantly in their spatial and temporal scales, molecular resolution and analytical complexity (6, 7).

Approach

  1. Foley SL, Lynne AM, Nayak R. Molecular typing methodologies for microbial source tracking and epidemiological investigations of Gram-negative bacterial foodborne pathogens. Infection, Genetics and Evolution. 2009;9:430-40.
  2. Zadoks RN, Schukken YH. Use of molecular epidemiology in veterinary practice. Veterinary Clinics of North America-Food Animal Practice. 2006 Mar;22(1):229-61.
  3. Metzker ML. Sequencing technologies - the next generation. Nature Reviews Genetics. 2010;11(1):31-46.
  4. Struelens MJ, Members E, Members E. Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. Clinical Microbiology and Infection. 1996 Members of the European Study Group on Epidemiological Markers (ESGEM) of the European Society for Clinical Microbiology and Infectious Diseases (ESCMID),. 2(1):2-11.
  5. Achtman M. A surfeit of YATMs? Journal of Clinical Microbiology. 1996 Jul;34(7):1870-.
  6. Muellner P, Zadoks R, Perez A, Spencer SEF, Schukken YH, French NP. The integration of molecular tools into veterinary and spatial epidemiology. Spatial and Spatio-temporal Epidemiology. 2011;2(3):159-71.
  7. Struelens MJ, De Gheldre Y, Deplano A. Comparative and library epidemiological typing systems: outbreak investigations versus surveillance systems. Infect. Control Hosp. Epidemiol. 19:565-569. Infection Control and Hospital Epidemiology. 1998;19:565-9.

Vive la Genomic Revolution

- a contribution by Kim Halpin (halpink@live.com) -

The genomic revolution has been driven by the exponential increase in DNA sequencing capabilities together with increased affordability. The human genome project was born in the molecular revolution and the entire human genome was sequenced. It took 13 years, thousands of sequencers, and $2 billion dollars, and it generated 21 Giga base pairs of sequence. Take one step forward to the genomic revolution and another human genome is sequenced, but this time it only takes 7 days, on one sequencer, costing $3000 and generating 5-10 times more data.

The key technology underpinning the genomic revolution is next generation sequencing (NGS). Traditional sequencing is based on sequencing amplified, reasonably large sections of a genome. NGS uses instruments such as Life Technologies SOLiDTM which sequences genomic material in millions of very small pieces, enabling a high throughput and deep sequencing. “Deep” refers to coverage over a certain segment of the genome – 60X coverage means for any nucleotide, it will be read 60 times. Sample preparation breaks the DNA or RNA into short segments that are attached to 500 million to 1 billion beads. Sequence from each bead is reported in a data file, and files can be over 100 GB.

On the market for approximately 5 years, NGS (which is sometimes referred to as second generation sequencing) already has a successor and this has a single-molecule approach. One single molecule approach is presented by Ion Torrent who have developed the first product to use semiconductor sequencing technology. In nature, when a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released as a byproduct. The charge from that ion will change the pH of the solution, which can be detected by an ion sensor. The Personal Genome Machine (PGM™) — essentially the world's smallest solid-state pH meter — will call the base, going directly from chemical information to digital information1.

The Ion Torrent PGM TM

The PGM™ offers semiconductor scalability and it currently costs less than $50,000 USD. It has a footprint that is no bigger than a desktop printer. In terms of applications, it is perfect for viral and bacterial genomes. With this platform we will see a rapid acceleration in the number of viral and bacterial sequences publicly available. It was preliminary data from DNA sequencing performed on the PGM™ which strongly suggested that the bacterium at the root of the deadly food borne outbreak in Germany was a new hybrid type of pathogenic E. coli strains2.

Following the initial sequencing of the Escherichia coli O104 outbreak, nine isolates have now been sequenced by four different teams on four different sequencing platforms. (including the PGMTM, Roche's 454 GS Junior,  Illumina HiSeq and the Illumina MiSeq)3. This crowd sourcing project is being used to annotate the genomes4. Crowd sourcing is when an open call goes out to a large undefined group of people, and data is sourced from this large crowd, with participation being voluntary. The different groups have been making their data publicly available and researchers have started this project to analyse and annotate the different assemblies. Researchers will use the data to generate a meta-assembly. Having data from multiple platforms could help in producing the most accurate assembly because it should compensate for any errors that might arise from a single platform3.

Because different isolates have been sequenced, the different genomes can be compared to see if there are genuine differences and to see if there are mutations that have occurred during the outbreak3. This may represent the way forward particularly for projects where a prompt answer is required. It will also ensure that the molecular epidemiology is not compromised by sequencing and data errors. 

people

Crowd sourcing: Is it the way for the future?

References

  1. http://www.iontorrent.com/about/overview/
  2. http://www.lifetechnologies.com/news-gallery/press-releases/2011/dna-sequencing-data-reveals-new-hybrid-e.html
  3. Heger M. (2011) E. Coli Sequencing Prompts Crowdsourcing Project to Annotate Genomes, Enabling Platform Comparisons. In Sequence available at: http://www.genomeweb.com/sequencing/e-coli-sequencing-prompts-crowdsourcing-project-annotate-genomes-enabling-platfo
  4. Location of crowd sourcing project: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki

Library and comparative typing

It is worth differentiating between comparative and library typing systems, as their value is different for different settings e.g. surveillance or outbreak investigation.

Comparative typing is to date the most common approach and can be used to study organisms within a defined context. An example for this approach would be the comparison of Salmonella strains from a contaminated food and disease case in a food-borne outbreak. A comparison of the two strains will show the similarity of the two strains detected and allow for conclusions about the food being the source of the outbreak. However simply typing an isolate from a human case, using the same typing technique, cannot be used to infer conclusions about the outbreak source as the results have no universal meaning.

On the other hand library typing is characterised by standardisation, high-throughput and a uniform nomenclature. This allows for meaningful interpretation of typing results in the absence of direct comparisons and for understanding typing results universally irrespective of when, where, or by whom the results were generated.Library typing has its main application in surveillance and monitoring and investigation of longer time periods but is becoming increasing available through the development of typing schemes such as MLST. Cost and benefits of library and comparative typing approaches have to be carefully considered and can vary considerably between different approaches. Such an evaluation should not only consider the monetary cost of equipment, but also the resources needed to for example train individuals in the technique.

 

CDC Lab

Picture courtesy of the CDC public health image library

 

References:

Struelens, M.J., De Gheldre, Y., Deplano, A., 1998. Comparative and library epidemiological typing systems: outbreak investigations versus surveillance systems. Infect. Control Hosp. Epidemiol. 19:565-569. Infection Control and Hospital Epidemiology 19, 565-569.

Zadoks, R.N., Schukken, Y.H., 2006. Use of molecular epidemiology in veterinary practice. Veterinary Clinics of North America-Food Animal Practice 22, 229-+.

Proportional Similarity Index

The proportional similarity index (PSI) or Czekanowski index is an objective and simple estimate of the area of intersection between two frequency distributions (Rosef, Kapperud et al. 1985). The PSI estimates the similarity between the frequency distributions of i.e. bacterial sub types between different reservoirs. It is calculated by:

formula_psi

where pi and qi represent the proportion of strains belonging to type i out of all strains typed from species P and Q
(Feinsinger, Spears et al. 1981; Rosef, Kapperud et al. 1985). Click on the below animation for a visualisation of the approach. The values for PS range from 1 for the highest possible similarity to 0 for distribution with no common types. Bootstrap confidence intervals for this measure can be estimated based on the approach applied by Garrett et al (Garrett, Devane et al. 2007). This technique has also recently applied to support source attribution studies of human campylobacteriosis (Mullner et al. 2009; Mullner et al. 2010).

The occurrence of non-typable strains in a dataset requires special attention when applying this method (Rosef, Kapperud et al. 1985; Garrett, Devane et al. 2007). The assumption made by this method is that epidemiological affinity between species is proportional to the similarity between the type distributions of the species being compared. This may be incorrect since many animal isolates may not be pathogenic, even if they are identical as determined by the typing method used.  If a source also contains a high proportion of non-pathogenic strains its importance as contributor to human cases may be masked (Garrett, Devane et al. 2007). In addition some of the human cases may have originated from non included sources, i.e. by travel (Rosef, Kapperud et al. 1985).


References:

Feinsinger, P., Spears, E. E., & Poole, R. W. (1981). A Simple Measure of Niche Breadth. Ecology, 62(1), 27-32.

Garrett, N., Devane, M. L., Hudson, J. A., Nicol, C., Ball, A., Klena, J. D., et al. (2007). Statistical comparison of Campylobacter jejuni subtypes from human cases and environmental sources. Journal of Applied Microbiology, 103(6), 2113-2121.

Mullner, P., Spencer, S. E. F., Wilson, D., Jones, G., Noble, A. D., Midwinter, A. C., et al. (2009). Assigning the source of human campylobacteriosis in New Zealand: A comparative genetic and epidemiological approach. Infection, Genetics and Evolution, 9, 1311-1319

Müllner, P., Collins-Emerson, J., Midwinter, A., Carter, P., Spencer S., Van derLogt, P., et al. (2010). Molecular epidemiology of Campylobacter jejuni in a geographically isolated country with a uniquely structured poultry industry. Applied and Environmental Microbiology, 76(7), 2145-2154.

Rosef, O., Kapperud, G., Lauwers, S., & Gondrosen, B. (1985). Serotyping of Campylobacter-jejuni, Campylobacter coli and Campylobacter-lardis from domestic and wild animals. Applied and Environmental Microbiology, 49(6), 1507-1510.

Clonal frame

The variety of evolutionary mechanisms by which bacteria evolve can pose problems when attempting to infer relationships between strains. Clonal frame is an approach developed by Didelot and Falush for MLST data (2007), which does infer the clonal relationship of bacteria by accounting for not only point mutations but also recombination events. The model estimates the extent of the clonal frame for each branch of the genealogy, which is the subset of the genome that has not undergone recombination.

This method can be used to decide whether a subset of isolates share common ancestry, to estimate the age of the common ancestor and hence to address a variety of epidemiological and ecological questions that hinge on the pattern of bacterial spread. However, be aware that since the key assumption of the model concerns recombination, and currently it does not model the origin of genetic imports, the model tends to underestimate the number of recombination compared to mutation events and can infer incorrect subdivisions, particularly if recombination is relatively frequent compared to mutation.

The algorithms have been implemented in a computer software package which is freely available at the webpage of the Department of Statistics of the University of Warwick.

Reference:
Didelot, X. and Falush, D. (2007) Inference of Bacterial Microevolution using Multilocus Sequence Data. Genetics 175, 1251-1266.

Isolates, strains and clones

It is important to clearly define the terms isolate, strain and clone to avoid confusion. Ideally in all molecular epidemiological studies these terms should be applied using the same definitions. For example "strain" and "isolates" are often used synonymously, and this may results in problems, in particular when the definition of the terms is not standardised.

The below definitions are approved by the American Society for Microbiology:

Isolate: A population of microbial cells in pure culture derived from a single colony on an isolation plate and identified to the species level.

Strain: An isolate or group of isolates exhibiting phenotypic and/or genotypic traits belonging to the same lineage, distinct from those of other isolates of the same species.

Clone: An isolate or group of isolates descending from a common precursor strain by nonsexual reproduction exhibiting phenotypic and/or genotypic traits characterised by a strain-typing method to belong to the same group.

The terms isolate, strain and clone form a hierarchy, as illustrated in the figure below.

Species, strains, isolates

 [This figure was taken from Zadoks et al. 2006]

 

Sources and Resources:

Zadoks, R. N., and Y. H. Schukken. 2006. Use of molecular epidemiology in veterinary practice. Veterinary Clinics of North America-Food Animal Practice 22:229-261.

Riley, L. W. 2004. Molecular Epidemiology of Infectious Diseases - Principles and Practices. ASM Press, Washington, DC.

Rarefaction analysis

Rarefaction analysis is a simple tool to compare genetic diversity. Sample-based rarefaction (also known as the species accumulation curve) is applicable when a number of samples are available, from which for example species richness is to be estimated as a function of number of samples.

This method is very useful when comparing genotypes from e.g. different sources or region, when the sampling effort differed. For example you are trying to compare the diversity of Campylobacter genotypes in human and poultry sources. Ideally you would have 100 typed isolates from each; the comparison would then be very straight forward – the source with more different genotypes is more diverse. However in reality almost always different numbers of isolates have been typed and the data could look as follows:

Source

Number of
isolates

Number of
different genotypes

Human

103

26

Poultry

78

20

The question then is: how do you compare the diversity in the two samples?

Basically rarefaction uses the data from the larger sample to answers the question "How many species (or genotypes) would have been found in a smaller sample?"

The technique originated in ecology in the 1960’s and can be used to compare the number of species found in different regions or from different sources when the sampling effort differed. In principal it can be expected that a greater sampling effort would yield a larger sample and more species, so you can't just compare the number of species found in each region or source.

If you found n organisms in the less-sampled region, rarefaction takes hypothetical sub-samples of n organisms from the more-sampled region, and calculates the average number of species in such sub-samples.

This average can be compared to the number of species actually found in the less-sampled region. (The method computes a variance and standard deviation to help you judge how significant any difference is.)

 

Rarefaction curves can be plotted (with or without corresponding confidence intervals) and look like this:

Rarefaction

Another interesting application of this technique is to evaluate, if continuous sampling will retrieve more new genotypes or species. You can see in the above example that the curve becomes flatter with increasing sampling effort, and it therefore becomes less likely to detect new types in the additional samples.

There is freeware available for this method on the Internet (e.g. “Rarefaction calculator”). The R package “Vegan” also contains a rarefaction function.

 

For example the method is used in the following papers:

Gormley, F. J., M. MacRae, K. J. Forbes, I. D. Ogden, J. F. Dallas, and N. J. C. Strachan. 2008. Has retail chicken played a role in the decline of human campylobacteriosis? Applied and Environmental Microbiology 74:383-390.

Perron, G. G., S. Quessy, A. Letellier, and G. Bell. 2007. Genotypic diversity and antimicrobial resistance in asymptomatic Salmonella enterica serotype Typhimurium DT104. Infection Genetics and Evolution 7:223-228.

 

Sources and Resources:

http://www.biology.ualberta.ca/jbrzusto/rarefact.php

http://cc.oulu.fi/~jarioksa/softhelp/vegan.html

http://en.wikipedia.org/wiki/Rarefaction_(ecology)

The ABC of choosing typing approaches

Which typing method? This really is the million dollar question in molecular epidemiological research. The choice of typing approach should firstly be driven by three aspects:

  1. The ecologic and evolutionary scale of the research question (e.g. outbreak investigation or long-term global spread)
  2. How rapidly does the pathogen evolve (e.g. rapidly evolving RNA virus)
  3. Assumption of the (genetic) method

Secondly performance criteria of available typing approaches need to be assessed. These include:

  1. Typeability
  2. Reproducibility
  3. Stability
  4. Discriminatory power

The importance of the choice of typing method is illustrated by the below graph. Assume that for example the typing approach using Marker A has a higher discriminatory power than Marker B. The discriminatory power is the average probability that the typing system will assign a different type to two unrelated strains randomly sampled in the microbial population. Depending on your choice of approach your findings will diverge, either showing that Strain 1-3 are identical or different.

Strains

[This figure was taken from Tibayrenc 1998]

 

This illustrates how crucial the choice of an appropriate techniques as well as the interpretation of the experimental data within a sound theoretical framework are.

It is important to underline in this context that an ideal typing system for universal use does not exist. We will explore each of the four performance criteria in detail in future articles.

 

Sources and Resources:

Zadoks, R. N., and Y. H. Schukken. 2006. Use of molecular epidemiology in veterinary practice. Veterinary Clinics of North America-Food Animal Practice 22:229-261.

Riley, L. W. 2004. Molecular Epidemiology of Infectious Diseases - Principles and Practices. ASM Press, Washington, DC.

Tibayrenc M. Beyond strain typing and molecular epidemiology: Integrated genetic epidemiology of infectious diseases. Parasitology Today 1998; 14: 323-329.

Struelens MJ, Members of the European Study Group on Epidemiological Markers (ESGEM) of the European Society for Clinical Microbiology and Infectious Diseases (ESCMID). Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. Clinical Microbiology and Infection 1996; 2: 2-11.

Syndicate content