Normalization methods for microbial abundance data strongly affect correlation estimates

Michelle H. Badri,Zachary D. Kurtz,Christian Müller,Richard Bonneau

Normalization methods for microbial abundance data strongly affect correlation estimates

2018

Consistent normalization of microbial genomic survey count data is fundamental to modern microbiome research. Technical artifacts in these data often obstruct standard comparison of microbial composition across samples and experiments. To correct for sampling bias, library size, and technical variability, a number of different normalization methods have been proposed, including adaptations of RNA-seq analysis work flows and log-ratio transformations from compositional data analysis. However, the effects of data normalization on higher-order summary statistics has remained elusive. We review and compare popular data normalization schemes and assess their effect on subsequent correlation estimation. Application of these normalization methods to the largest publicly available human gut microbiome dataset show substantial variation among patterns of correlation. We show that log-ratio and variance-stabilization transformations provide the most consistent estimates across experiments of different sample sizes. We also show that data analysis methods that rely on correlation, such as data clustering and network inference, differ depending on the normalization schemes. These findings have important implications for microbiome studies in multiple stages of analysis.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations