Stochastic variational variable selection for high-dimensional microbiome data

2021
Background: The rapid and accurate identification of a minimal-size core set of representative microbial species plays an important role in the clustering of microbial community data and interpretation of the clustering results. However, the huge dimensionality of microbial metagenomics data sets is a major challenge for the existing methods such as Dirichlet multinomial mixture (DMM) models. In the framework of the existing methods, computational burdens for identifying a small number of representative species from a huge number of observed species remain a challenge. Results: We proposed a novel framework to improve the performance of the widely used DMM approach by combining three ideas: (i) We extended the finite DMM model to the infinite case, via the consideration of Dirichlet process mixtures and estimate the number of clusters as a random variables. (ii) We proposed an indicator variable to identify representative operational taxonomic units that substantially contribute to the differentiation among clusters. (iii) To address the computational burdens of the high-dimensional microbiome data, we proposed are a stochastic variational inference, which approximates the posterior distribution using a controllable distribution called variational distribution, and stochastic optimization algorithms for fast computation. With the proposed method named stochastic variational variable selection (SVVS), we analyzed the root microbiome data collected in our soybean field experiment and the human gut microbiome data from three published data sets of large-scale case-control studies. Conclusions: SVVS demonstrated a better performance and significantly faster computation than existing methods in all cases of testing data sets. In particular, SVVS is the only method that can analyze the massive high-dimensional microbial data with above 50,000 microbial species and 1,000 samples. Furthermore, it was suggested that microbial species selected as a core set played important roles in the recent microbiome studies.
    • Correction
    • Source
    • Cite
    • Save
    61
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map