What is high-throughput sequencing ?
A person's genome is made up of 3.3 billion base pairs (nucleic acids) and contains a vast of amount of medically relevant information. The sequencing of these nucleic acids is unquestionably the area of analytics that has benefited from the most significant advancements over the past two decades. Automation of the Sanger enzymatic method with the use of capillary sequencers was achieved, leading to development of increasingly efficient machines, and resulting in human genome decoding, a pioneering project in genetics in the late 2000s.
In recent years, a new generation of sequencers characterised as “high-throughput”, which work concurrently on a very large number of short sequences, has emerged. They are based on new physico-chemical technologies with throughputs up to 1000 times higher. These new sequencers are expanding in capacity while their production cost continues to shrink: at present, 400 million to 1 billion base pairs a day for less than one Euro per million base pairs. Consequently, two important research works published in 2005 paved the way for the development of these technologies known as “massively parallel sequencing” or “high-throughput sequencing”.
As a result of the marketing of these methods, sequencing can now be performed on whole genomes or their coding regions: an exome or 1% of a genome, the same as on a panel of selected genes. With the introduction of this technical advance, analyses in transcriptomics (RNA-seq) and proteomics (ChIP-Seq), as well as epigenomic and metagenomic analyses can likewise be systematically performed.
To perform these analyses, bioinformatics platforms tested on these new technologies are required. Positioned at the forefront, and cognisant of the tremendous relevance of these technologies to the rare disease diagnosis, FHU-TRANSLAD is backing the development of a bioinformatics platform for high-throughput data sequencing analysis in Bourgogne.
The high-throughput sequencing technique
Genomic DNA is first fragmented into small bits and adapter sequences ligated to their ends. The libraries prepared in this way are then attached to a solid medium and each fragment is amplified about 1000 times to form clonal “clusters” which will be concurrently sequenced. During each cycle, the addition of a nucleotide is indicated by a fluorescent signal linked to each of the four DNA nucleotides
Excess sequencing is performed to minimise lack of uniformity of the various technical processes (typically > 30X for a whole genome and > 60X for an exome) and to increase the sensitivity and specificity of genetic variant detection.
Where sequencing is performed on exomes or specific targets, an additional phase for capturing the region of interest is required for preparation of the libraries. Various methods may be used to achieve this depending on the type of experiment: capture by hybridization (normally used for exomes), enrichment by polymerase chain reaction (PCR) and capture by inverted molecular probes. Lastly, several different samples can be sequenced simultaneously on the same machine by using index sequences to identify them.
In what ways is it used in diagnosis ?
Its use is surfacing in numerous medical applications, either through the use of selected panels of genes, or exome sequencing. Widely employed in research, this technology is also used diagnostically in laboratories around the world, as it is the most efficient tool in rare developmental disorder disease diagnosis. In fact, this technology can identify a genetic cause in in 50% of patients with intellectual disability syndrome who have not been clinically diagnosed.