Not just BLAST nt: WGS database joins the party
2019
Since its introduction in 1990 and with over 50k citations, the NCBI BLAST family has been an essential tool of in silico molecular biology. The BLAST nt database, based on the traditional divisions of
GenBank, has been the default and most comprehensive database for nucleotide BLAST searches and for taxonomic classification software in
metagenomics. Here we argue that this is no longer the case. Currently, the NCBI WGS database contains one billion reads (almost five times more than
GenBank), and with 4.4 trillion nucleotides, WGS has about 14 times more nucleotides than
GenBank. This ratio is growing with time. We advocate a change in the database paradigm in taxonomic classification by systematically combining the nt and WGS databases in order to boost taxonomic classifiers sensitivity. We present here a case in which, by adding WGS data, we obtained over five times more classified reads and with a higher confidence score. To facilitate the adoption of this approach, we provide the draftGenomes script.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
14
References
3
Citations
NaN
KQI