Panaconda: Application of pan-synteny graph models to genome content analysis
2017
Motivation: Whole-
genomealignment and
pan-genomeanalysis are useful tools in understanding the similarities and differences of many
genomesin an evolutionary context. Here we introduce the concept of pan-
syntenygraphs, an analysis method that combines elements of both to represent conservation and change of multiple prokaryotic
genomesat an architectural level. Pan-
syntenygraphs represent a reference free approach for the comparison of many
genomesand allows for the identification of
synteny, insertion, deletion, replacement, inversion, recombination, missed assembly joins, evolutionary hotspots, and reference based scaffolding. Results: We present an algorithm for creating whole
genome
multiple sequencecomparisons and a model for representing the similarities and differences among sequences as a graph of
syntenic genefamilies. As part of the pan-
syntenygraph creation, we first create a
de Bruijn graph. Instead of the alphabet of nucleotides commonly used in
genomeassembly, we use an alphabet of
gene families. This
de Bruijn graphis then processed to create the pan-
syntenygraph. Our approach is novel in that it explicitly controls how regions from the same sequence and
genomeare aligned and generates a graph in which all sequences are fully represented as paths. This method harnesses previous computation involved in protein family calculation to speed up the creation of whole
genomealignment for many
genomes. We provide the
software suitePanaconda, for the calculation of pan-
syntenygraphs given annotation input, and an implementation of methods for their layout and visualization.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
32
References
3
Citations
NaN
KQI