Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease
2018
Many neurodegenerative diseases are caused by nucleotide
repeat
expansions, but most
expansions, like the
C9orf72‘GGGGCC’ (G4C2)
repeatthat causes approximately 5–7% of all amyotrophic lateral sclerosis (ALS) and
frontotemporal dementia(FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging
repeat
expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing
repeatscloned into plasmids, including the FTD/ALS-causing G4C2
repeat
expansion. We also report the first long-read sequencing data characterizing the
C9orf72G4C2
repeat
expansionat the nucleotide level in two symptomatic
expansioncarriers using PacBio
whole-genome sequencingand a no-amplification (No-Amp) targeted approach based on CRISPR/
Cas9. Both the PacBio and ONT platforms successfully sequenced through the
repeat
expansionsin plasmids. Throughput on the
MinIONwas a challenge for
whole-genome sequencing; we were unable to attain reads covering the human
C9orf72
repeat
expansionusing 15 flow cells. We obtained 8× coverage across the
C9orf72locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight
repeats, and reading through the entire
expansionwith 1324
repeats(7941 nucleotides). Using the No-Amp targeted approach, we attained > 800× coverage and were able to identify the unexpanded allele, closely estimate
expansionsize, and assess nucleotide content in a single experiment. We estimate the individual’s
repeatregion was > 99% G4C2 content, though we cannot rule out small interruptions. Our findings indicate that long-read sequencing is well suited to characterizing known
repeat
expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying
repeat
expansionsthat have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in
C9orf72
expansioncarriers will be important to determine heterogeneity and whether the
repeatsare interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other
repeat-
expansiondisorders. These results have broad implications across all diseases where the genetic etiology remains unclear.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
51
References
56
Citations
NaN
KQI