Evaluating metagenomic assembly approaches for biome-specific gene catalogues

2021 
For many environments, biome-specific microbial gene catalogues are being recovered using shotgun metagenomics followed by assembly and gene-calling on the assembled contigs. The assembly can be conducted either by individually assembling each sample or by co-assembling reads from all the samples. The co-assembly approach can potentially recover genes that display too low abundance to be assembled from individual samples. On the other hand, combining samples increases the risk of mixing data from closely related strains, which can hamper the assembly process. In this respect, assembly on individual samples followed by clustering of (near) identical genes is likely preferable. Thus, both approaches have pros and cons and it remains to be evaluated which assembly strategy is most effective. Here, we have evaluated three assembly strategies for generating gene catalogues from metagenomes using a dataset of 124 samples from the Baltic Sea: 1) assembly on individual samples followed by clustering of the resulting genes, 2) co-assembly on all samples, and 3) mix-assembly, combining individual and co-assembly. The mix-assembly approach resulted in a more extensive non-redundant gene set than the other approaches, and with more genes predicted to be complete and that could be functionally annotated. The mix-assembly consists of 67 million genes (Baltic Sea gene set; BAGS) that have been functionally and taxonomically annotated. The majority of the BAGS genes are dissimilar (<95% amino acid identity) to the Tara Oceans gene dataset, and hence BAGS represents a valuable resource for brackish water research. IMPORTANCESeveral ecosystem types, such as soils and oceans, are studied through metagenomics. It allows the analysis of genetic material of the microbes within a sample without the need for cultivation. When performing the DNA sequencing with an instrument that generates short sequence reads, these reads need to be assembled in order to obtain more complete gene sequences. In this paper, we have evaluated three strategies for assembling metagenome sequences using a large metagenomic dataset from the Baltic Sea. The method that we call mix-assembly generated the greatest number of non-redundant genes and the largest fraction of genes that were predicted to be complete. The resulting gene catalogue will serve as an important resource for brackish water research. We believe this method to be efficient also for generating gene catalogs for other biomes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map