A Bayes testing approach to metagenomic profiling in bacteria

Bertrand S Clarke, Camilo Valdes, Adrian Dobra, Jennifer L Clarke

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Using next generation sequencing (NGS) data, we use a multinomial with a Dirichlet prior to detect the presence of bacteria in a metagenomic sample via marginal Bayes testing for each bacterial strain. The NGS reads per strain are counted fractionally with each read contributing an equal amount to each strain it might represent. The threshold for detection is strain-dependent and we apply a correction for the dependence amongst the (NGS) reads by finding the knee in a curve representing a tradeoff between detecting too many strains and not enough strains. As a check, we evaluate the joint posterior probabilities for the presence of two strains of bacteria and find relatively little dependence. We apply our techniques to two data sets and compare our results with the results found by the Human Microbiome Project. We conclude with a discussion of the issues surrounding multiple corrections in a Bayes context.

Original languageEnglish (US)
Pages (from-to)173-185
Number of pages13
JournalStatistics and its Interface
Volume8
Issue number2
DOIs
StatePublished - Jan 1 2015

Fingerprint

Bayes
Profiling
Bacteria
Testing
Sequencing
Dirichlet Prior
Posterior Probability
Trade-offs
Curve
Dependent
Evaluate

Keywords

  • Bacteria
  • Bayes testing
  • Dependence
  • Metagenomics

ASJC Scopus subject areas

  • Statistics and Probability
  • Applied Mathematics

Cite this

A Bayes testing approach to metagenomic profiling in bacteria. / Clarke, Bertrand S; Valdes, Camilo; Dobra, Adrian; Clarke, Jennifer L.

In: Statistics and its Interface, Vol. 8, No. 2, 01.01.2015, p. 173-185.

Research output: Contribution to journalArticle

Clarke, Bertrand S ; Valdes, Camilo ; Dobra, Adrian ; Clarke, Jennifer L. / A Bayes testing approach to metagenomic profiling in bacteria. In: Statistics and its Interface. 2015 ; Vol. 8, No. 2. pp. 173-185.
@article{bbf78edf19aa4a0c83f94ebb4a4b86c3,
title = "A Bayes testing approach to metagenomic profiling in bacteria",
abstract = "Using next generation sequencing (NGS) data, we use a multinomial with a Dirichlet prior to detect the presence of bacteria in a metagenomic sample via marginal Bayes testing for each bacterial strain. The NGS reads per strain are counted fractionally with each read contributing an equal amount to each strain it might represent. The threshold for detection is strain-dependent and we apply a correction for the dependence amongst the (NGS) reads by finding the knee in a curve representing a tradeoff between detecting too many strains and not enough strains. As a check, we evaluate the joint posterior probabilities for the presence of two strains of bacteria and find relatively little dependence. We apply our techniques to two data sets and compare our results with the results found by the Human Microbiome Project. We conclude with a discussion of the issues surrounding multiple corrections in a Bayes context.",
keywords = "Bacteria, Bayes testing, Dependence, Metagenomics",
author = "Clarke, {Bertrand S} and Camilo Valdes and Adrian Dobra and Clarke, {Jennifer L}",
year = "2015",
month = "1",
day = "1",
doi = "10.4310/SII.2015.v8.n2.a5",
language = "English (US)",
volume = "8",
pages = "173--185",
journal = "Statistics and its Interface",
issn = "1938-7989",
publisher = "International Press of Boston, Inc.",
number = "2",

}

TY - JOUR

T1 - A Bayes testing approach to metagenomic profiling in bacteria

AU - Clarke, Bertrand S

AU - Valdes, Camilo

AU - Dobra, Adrian

AU - Clarke, Jennifer L

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Using next generation sequencing (NGS) data, we use a multinomial with a Dirichlet prior to detect the presence of bacteria in a metagenomic sample via marginal Bayes testing for each bacterial strain. The NGS reads per strain are counted fractionally with each read contributing an equal amount to each strain it might represent. The threshold for detection is strain-dependent and we apply a correction for the dependence amongst the (NGS) reads by finding the knee in a curve representing a tradeoff between detecting too many strains and not enough strains. As a check, we evaluate the joint posterior probabilities for the presence of two strains of bacteria and find relatively little dependence. We apply our techniques to two data sets and compare our results with the results found by the Human Microbiome Project. We conclude with a discussion of the issues surrounding multiple corrections in a Bayes context.

AB - Using next generation sequencing (NGS) data, we use a multinomial with a Dirichlet prior to detect the presence of bacteria in a metagenomic sample via marginal Bayes testing for each bacterial strain. The NGS reads per strain are counted fractionally with each read contributing an equal amount to each strain it might represent. The threshold for detection is strain-dependent and we apply a correction for the dependence amongst the (NGS) reads by finding the knee in a curve representing a tradeoff between detecting too many strains and not enough strains. As a check, we evaluate the joint posterior probabilities for the presence of two strains of bacteria and find relatively little dependence. We apply our techniques to two data sets and compare our results with the results found by the Human Microbiome Project. We conclude with a discussion of the issues surrounding multiple corrections in a Bayes context.

KW - Bacteria

KW - Bayes testing

KW - Dependence

KW - Metagenomics

UR - http://www.scopus.com/inward/record.url?scp=84924404433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924404433&partnerID=8YFLogxK

U2 - 10.4310/SII.2015.v8.n2.a5

DO - 10.4310/SII.2015.v8.n2.a5

M3 - Article

VL - 8

SP - 173

EP - 185

JO - Statistics and its Interface

JF - Statistics and its Interface

SN - 1938-7989

IS - 2

ER -