Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease

2018
Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72‘GGGGCC’ (G4C2) repeatthat causes approximately 5–7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia(FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeatscloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72G4C2 repeat expansionat the nucleotide level in two symptomatic expansioncarriers using PacBio whole-genome sequencingand a no-amplification (No-Amp) targeted approach based on CRISPR/ Cas9. Both the PacBio and ONT platforms successfully sequenced through the repeat expansionsin plasmids. Throughput on the MinIONwas a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansionusing 15 flow cells. We obtained 8× coverage across the C9orf72locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansionwith 1324 repeats(7941 nucleotides). Using the No-Amp targeted approach, we attained > 800× coverage and were able to identify the unexpanded allele, closely estimate expansionsize, and assess nucleotide content in a single experiment. We estimate the individual’s repeatregion was > 99% G4C2 content, though we cannot rule out small interruptions. Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansionsthat have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansioncarriers will be important to determine heterogeneity and whether the repeatsare interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat- expansiondisorders. These results have broad implications across all diseases where the genetic etiology remains unclear.
    • Correction
    • Source
    • Cite
    • Save
    51
    References
    56
    Citations
    NaN
    KQI
    []
    Baidu
    map