Panaconda: Application of pan-synteny graph models to genome content analysis

2017
Motivation: Whole- genomealignment and pan-genomeanalysis are useful tools in understanding the similarities and differences of many genomesin an evolutionary context. Here we introduce the concept of pan- syntenygraphs, an analysis method that combines elements of both to represent conservation and change of multiple prokaryotic genomesat an architectural level. Pan- syntenygraphs represent a reference free approach for the comparison of many genomesand allows for the identification of synteny, insertion, deletion, replacement, inversion, recombination, missed assembly joins, evolutionary hotspots, and reference based scaffolding. Results: We present an algorithm for creating whole genome multiple sequencecomparisons and a model for representing the similarities and differences among sequences as a graph of syntenic genefamilies. As part of the pan- syntenygraph creation, we first create a de Bruijn graph. Instead of the alphabet of nucleotides commonly used in genomeassembly, we use an alphabet of gene families. This de Bruijn graphis then processed to create the pan- syntenygraph. Our approach is novel in that it explicitly controls how regions from the same sequence and genomeare aligned and generates a graph in which all sequences are fully represented as paths. This method harnesses previous computation involved in protein family calculation to speed up the creation of whole genomealignment for many genomes. We provide the software suitePanaconda, for the calculation of pan- syntenygraphs given annotation input, and an implementation of methods for their layout and visualization.
    • Correction
    • Source
    • Cite
    • Save
    32
    References
    3
    Citations
    NaN
    KQI
    []
    Baidu
    map