Not just BLAST nt: WGS database joins the party

2019
Since its introduction in 1990 and with over 50k citations, the NCBI BLAST family has been an essential tool of in silico molecular biology. The BLAST nt database, based on the traditional divisions of GenBank, has been the default and most comprehensive database for nucleotide BLAST searches and for taxonomic classification software in metagenomics. Here we argue that this is no longer the case. Currently, the NCBI WGS database contains one billion reads (almost five times more than GenBank), and with 4.4 trillion nucleotides, WGS has about 14 times more nucleotides than GenBank. This ratio is growing with time. We advocate a change in the database paradigm in taxonomic classification by systematically combining the nt and WGS databases in order to boost taxonomic classifiers sensitivity. We present here a case in which, by adding WGS data, we obtained over five times more classified reads and with a higher confidence score. To facilitate the adoption of this approach, we provide the draftGenomes script.
    • Correction
    • Source
    • Cite
    • Save
    14
    References
    3
    Citations
    NaN
    KQI
    []
    Baidu
    map