Negative binomial mixed models for analyzing microbiome count data

Xinyan Zhang, Himel Mallick, Zaixiang Tang, Lei Zhang, Xiangqin Cui, Andrew K. Benson, Nengjun Yi

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Background: Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. Results: In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. Conclusions: We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/and http://github.com/abbyyan3/BhGLM), providing a useful tool for analyzing microbiome data.

Original languageEnglish (US)
Article number4
JournalBMC bioinformatics
Volume18
Issue number1
DOIs
StatePublished - Jan 3 2017

Fingerprint

Binomial Model
Negative Binomial
Count Data
Microbiota
Mixed Model
Statistical Models
Zero-inflation
Overdispersion
Sequencing
Negative Binomial Model
Economic Inflation
Mixed Effects Model
Linear Mixed Model
Correlated Data
Fixed Effects
Type I error
Weighted Least Squares
Least Square Algorithm
Hierarchical Structure
Random Effects

Keywords

  • Correlated measures
  • Count data
  • Metagenomics
  • Microbiome
  • Negative binomial model
  • Penalized Quasi-likelihood
  • Random effects

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Negative binomial mixed models for analyzing microbiome count data. / Zhang, Xinyan; Mallick, Himel; Tang, Zaixiang; Zhang, Lei; Cui, Xiangqin; Benson, Andrew K.; Yi, Nengjun.

In: BMC bioinformatics, Vol. 18, No. 1, 4, 03.01.2017.

Research output: Contribution to journalArticle

Zhang, Xinyan ; Mallick, Himel ; Tang, Zaixiang ; Zhang, Lei ; Cui, Xiangqin ; Benson, Andrew K. ; Yi, Nengjun. / Negative binomial mixed models for analyzing microbiome count data. In: BMC bioinformatics. 2017 ; Vol. 18, No. 1.
@article{f654e2b79e3647bdbd0ddfac9644b430,
title = "Negative binomial mixed models for analyzing microbiome count data",
abstract = "Background: Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. Results: In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. Conclusions: We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/and http://github.com/abbyyan3/BhGLM), providing a useful tool for analyzing microbiome data.",
keywords = "Correlated measures, Count data, Metagenomics, Microbiome, Negative binomial model, Penalized Quasi-likelihood, Random effects",
author = "Xinyan Zhang and Himel Mallick and Zaixiang Tang and Lei Zhang and Xiangqin Cui and Benson, {Andrew K.} and Nengjun Yi",
year = "2017",
month = "1",
day = "3",
doi = "10.1186/s12859-016-1441-7",
language = "English (US)",
volume = "18",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Negative binomial mixed models for analyzing microbiome count data

AU - Zhang, Xinyan

AU - Mallick, Himel

AU - Tang, Zaixiang

AU - Zhang, Lei

AU - Cui, Xiangqin

AU - Benson, Andrew K.

AU - Yi, Nengjun

PY - 2017/1/3

Y1 - 2017/1/3

N2 - Background: Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. Results: In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. Conclusions: We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/and http://github.com/abbyyan3/BhGLM), providing a useful tool for analyzing microbiome data.

AB - Background: Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. Results: In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. Conclusions: We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/and http://github.com/abbyyan3/BhGLM), providing a useful tool for analyzing microbiome data.

KW - Correlated measures

KW - Count data

KW - Metagenomics

KW - Microbiome

KW - Negative binomial model

KW - Penalized Quasi-likelihood

KW - Random effects

UR - http://www.scopus.com/inward/record.url?scp=85008149994&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008149994&partnerID=8YFLogxK

U2 - 10.1186/s12859-016-1441-7

DO - 10.1186/s12859-016-1441-7

M3 - Article

C2 - 28049409

AN - SCOPUS:85008149994

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 4

ER -