Normalization methods for microbial abundance data strongly affect correlation estimates

2018 
Consistent normalization of microbial genomic survey count data is fundamental to modern microbiome research. Technical artifacts in these data often obstruct standard comparison of microbial composition across samples and experiments. To correct for sampling bias, library size, and technical variability, a number of different normalization methods have been proposed, including adaptations of RNA-seq analysis work flows and log-ratio transformations from compositional data analysis. However, the effects of data normalization on higher-order summary statistics has remained elusive. We review and compare popular data normalization schemes and assess their effect on subsequent correlation estimation. Application of these normalization methods to the largest publicly available human gut microbiome dataset show substantial variation among patterns of correlation. We show that log-ratio and variance-stabilization transformations provide the most consistent estimates across experiments of different sample sizes. We also show that data analysis methods that rely on correlation, such as data clustering and network inference, differ depending on the normalization schemes. These findings have important implications for microbiome studies in multiple stages of analysis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    55
    References
    16
    Citations
    NaN
    KQI
    []
    Baidu
    map