Vive la Genomic Revolution
- a contribution by Kim Halpin (email@example.com) -
The genomic revolution has been driven by the exponential increase in DNA sequencing capabilities together with increased affordability. The human genome project was born in the molecular revolution and the entire human genome was sequenced. It took 13 years, thousands of sequencers, and $2 billion dollars, and it generated 21 Giga base pairs of sequence. Take one step forward to the genomic revolution and another human genome is sequenced, but this time it only takes 7 days, on one sequencer, costing $3000 and generating 5-10 times more data.
The key technology underpinning the genomic revolution is next generation sequencing (NGS). Traditional sequencing is based on sequencing amplified, reasonably large sections of a genome. NGS uses instruments such as Life Technologies SOLiDTM which sequences genomic material in millions of very small pieces, enabling a high throughput and deep sequencing. “Deep” refers to coverage over a certain segment of the genome – 60X coverage means for any nucleotide, it will be read 60 times. Sample preparation breaks the DNA or RNA into short segments that are attached to 500 million to 1 billion beads. Sequence from each bead is reported in a data file, and files can be over 100 GB.
On the market for approximately 5 years, NGS (which is sometimes referred to as second generation sequencing) already has a successor and this has a single-molecule approach. One single molecule approach is presented by Ion Torrent who have developed the first product to use semiconductor sequencing technology. In nature, when a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released as a byproduct. The charge from that ion will change the pH of the solution, which can be detected by an ion sensor. The Personal Genome Machine (PGM™) — essentially the world's smallest solid-state pH meter — will call the base, going directly from chemical information to digital information1.
The PGM™ offers semiconductor scalability and it currently costs less than $50,000 USD. It has a footprint that is no bigger than a desktop printer. In terms of applications, it is perfect for viral and bacterial genomes. With this platform we will see a rapid acceleration in the number of viral and bacterial sequences publicly available. It was preliminary data from DNA sequencing performed on the PGM™ which strongly suggested that the bacterium at the root of the deadly food borne outbreak in Germany was a new hybrid type of pathogenic E. coli strains2.
Following the initial sequencing of the Escherichia coli O104 outbreak, nine isolates have now been sequenced by four different teams on four different sequencing platforms. (including the PGMTM, Roche's 454 GS Junior, Illumina HiSeq and the Illumina MiSeq)3. This crowd sourcing project is being used to annotate the genomes4. Crowd sourcing is when an open call goes out to a large undefined group of people, and data is sourced from this large crowd, with participation being voluntary. The different groups have been making their data publicly available and researchers have started this project to analyse and annotate the different assemblies. Researchers will use the data to generate a meta-assembly. Having data from multiple platforms could help in producing the most accurate assembly because it should compensate for any errors that might arise from a single platform3.
Because different isolates have been sequenced, the different genomes can be compared to see if there are genuine differences and to see if there are mutations that have occurred during the outbreak3. This may represent the way forward particularly for projects where a prompt answer is required. It will also ensure that the molecular epidemiology is not compromised by sequencing and data errors.
Crowd sourcing: Is it the way for the future?
- Heger M. (2011) E. Coli Sequencing Prompts Crowdsourcing Project to Annotate Genomes, Enabling Platform Comparisons. In Sequence available at: http://www.genomeweb.com/sequencing/e-coli-sequencing-prompts-crowdsourcing-project-annotate-genomes-enabling-platfo
- Location of crowd sourcing project: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki