What is high-throughput sequencing?
The genome of an individual is composed of 3.3 billion base pairs (nucleic acids) and contains a constellation of medically-pertinent information. The sequencing of these nucleic acids is without doubt the field of analysis that has enjoyed the most significant evolution over the last 20 years. The Sanger enzymatic method, which used capillary sequencing, was automated leading to the development of more and more efficient apparatus, and was at the origin of the project to decipher the human genome, the foremost project in genetics at the turn of the century.
Over the last few years, a new generation of sequencers, high-throughput sequencers, have been operating in parallel on a huge number of short sequences. These apparatus rely on new physico-chemical technologies and have a throughput up to 1000 times greater than previous technologies. The capacity of these new sequencers is becoming greater and greater while the price is becoming lower and lower: currently 400 million to 1 billion base pairs per day for less than one euro per million base pairs. As a result, two major studies published in 2005, paved the way for the development of these technologies called «Massively Parallel Signature Sequencing» or «High-Throughput Sequencing».
Today, the commercialisation of these methods has made it possible to sequence whole genomes or their coding regions, the exome, which accounts for 1% of the genome, or selected panels of genes, as a matter of routine. This technical evolution has also led to systematic analysis of the transcriptome (RNA-Seq) and the proteome (ChIP-Seq) as well as the epigenomic or metagenomic sequencing.
These analyses require bio-informatics platforms adapted to these new technologies. In an avant-garde project, FHU-TRANSLAD, aware of the major interest of these technologies for the diagnosis of rare diseases, is supporting the development of a bioinformatics analysis platform for high-throughput sequencing data in Burgundy.
The technique of high-throughput sequencing
Genomic DNA is first of all fragmented into small pieces to the ends of which adapter sequences are fixed. The libraries thus prepared are fixed onto a solid support and each fragment is amplified about 1000 times to form clonal «clusters» which will be sequenced in parallel. At each cycle, the addition of a nucleotide is indicated by a fluorescent signal associated with each of the four nucleotides of DNA.
Excess sequencing is done to minimise the non-uniformity of different technical processes (typically > 30X for a whole genome and > 60X for an exome) and to increase the sensitivity and specificity of the detection of genetic variations.
In cases when the exome or specific targets are sequenced, library preparation requires an additional step that aims to capture regions of interest. Different methods can be used to achieve this depending on the type of experiment: capture by hybridisation (generally used for exomes), enrichment by polymerase chain reaction (PCR) and capture by molecular inversion probes. Finally, several different samples can be sequenced simultaneously on the same apparatus by using index sequences to identify them.
What uses in diagnosis?
Many medical applications are coming to light through the use of panels of selected genes, or exome sequencing. This technology is widely used not only for research purposes, but also for diagnoses in laboratories around the world, because it is the most powerful tool for the diagnosis of rare diseases with development disorders. Indeed, for patients with syndromic intellectual deficiency for which there is no clinical diagnosis, this technology can reveal the genetic cause in 50% of cases.
TRANSLAD, on the initiative of a film to better understand the Next Generation Sequencing
The FHU is at the initiative of an educational film aimed at professionals and the general public to better understand the contribution of the new technique of next generation sequencing in the diagnosis of rare diseases. You can view it via the link.