Genotyping Structural Variations using Long Read data

2019
Studies on structural variants (SV) are expanding rapidly. As a result, and thanks to third generation sequencing technologies, more and more SVs are discovered, especially in the human genome. At the same time, for several applications such as clinical diagnoses, it becomes important to genotypenewly sequencedindividuals on well defined and characterized SVs. Whereas many SV genotypers have been developed for short read data, there still have no approaches to assess whether some SVs are present or not in a new sequenced sample of long reads, from third generation sequencing technologies, such as Pacific Biosciences or Nanopore. In this work, we present a method to genotype known SVs from long read sequencing. The principle is based on the generation of a set of reference sequences that represent the two alleles of each structural variant. Alignments are built from mapping the long reads to these reference sequences. They are then analyzed and filtered out to keep only informative ones, in order to quantify and estimate the presence of each allele. Currently, the genotyping of large deletions have been investigated. Tests on simulated long reads based on 1000 deletions from dbVAR show a precision of 95.8%. We also applied the method to the whole NA12878 human genome.
    • Correction
    • Source
    • Cite
    • Save
    0
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map