Detecting bacterial genomes in a metagenomic sample using NGS reads

Camilo Valdes, Meghan Brennan, Bertrand Clarke, Jennifer Clarke

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

We use a nucleotide flipping technique on whole genome next generation sequencing (NGS) data to test for the presence of various bacterial strains in a single metagenomic sample. Our technique is novel in that we induce artificial point mutations at the nucleotide level to define a test statistic for each genome on a given reference list. After finding a suitable nucleotide flipping rate, we use a variant of the Westfall-Young procedure to correct for multiple comparisons. When we align reads to reference genomes we permit fractional reads i.e., we weight the contribution of each read by one over the number of genomes to which it aligns. In a large scale simulation we characterize our method's performance on 'clean' data with respect to accuracy, genome lengths and genome abundances. Then, we apply our technique to real data from the Human Microbiome Project (HMP). We compare our results based on adjusted p-values with the HMP findings based on abundance, as assessed by coverage. The results from the two methods have substantial overlap; discrepancies can be explained by the inherent variability of the respective processing pipelines and data.

Original languageEnglish (US)
Pages (from-to)477-494
Number of pages18
JournalStatistics and its Interface
Volume8
Issue number4
DOIs
StatePublished - Jan 1 2015

Fingerprint

Sequencing
Genome
Genes
Nucleotides
Adjusted P-value
Multiple Comparisons
Test Statistic
Discrepancy
Overlap
Mutation
Coverage
Fractional
Pipelines
Statistics
Processing
Simulation
Human

Keywords

  • Artificial point mutations
  • Human microbiome project
  • Metagenomics
  • Multiple comparisons
  • Next generation sequencing
  • Nucleotide flipping

ASJC Scopus subject areas

  • Statistics and Probability
  • Applied Mathematics

Cite this

Detecting bacterial genomes in a metagenomic sample using NGS reads. / Valdes, Camilo; Brennan, Meghan; Clarke, Bertrand; Clarke, Jennifer.

In: Statistics and its Interface, Vol. 8, No. 4, 01.01.2015, p. 477-494.

Research output: Contribution to journalArticle

Valdes, Camilo ; Brennan, Meghan ; Clarke, Bertrand ; Clarke, Jennifer. / Detecting bacterial genomes in a metagenomic sample using NGS reads. In: Statistics and its Interface. 2015 ; Vol. 8, No. 4. pp. 477-494.
@article{13be1c90071146239914b94a8bcdfa87,
title = "Detecting bacterial genomes in a metagenomic sample using NGS reads",
abstract = "We use a nucleotide flipping technique on whole genome next generation sequencing (NGS) data to test for the presence of various bacterial strains in a single metagenomic sample. Our technique is novel in that we induce artificial point mutations at the nucleotide level to define a test statistic for each genome on a given reference list. After finding a suitable nucleotide flipping rate, we use a variant of the Westfall-Young procedure to correct for multiple comparisons. When we align reads to reference genomes we permit fractional reads i.e., we weight the contribution of each read by one over the number of genomes to which it aligns. In a large scale simulation we characterize our method's performance on 'clean' data with respect to accuracy, genome lengths and genome abundances. Then, we apply our technique to real data from the Human Microbiome Project (HMP). We compare our results based on adjusted p-values with the HMP findings based on abundance, as assessed by coverage. The results from the two methods have substantial overlap; discrepancies can be explained by the inherent variability of the respective processing pipelines and data.",
keywords = "Artificial point mutations, Human microbiome project, Metagenomics, Multiple comparisons, Next generation sequencing, Nucleotide flipping",
author = "Camilo Valdes and Meghan Brennan and Bertrand Clarke and Jennifer Clarke",
year = "2015",
month = "1",
day = "1",
doi = "10.4310/SII.2015.v8.n4.a7",
language = "English (US)",
volume = "8",
pages = "477--494",
journal = "Statistics and its Interface",
issn = "1938-7989",
publisher = "International Press of Boston, Inc.",
number = "4",

}

TY - JOUR

T1 - Detecting bacterial genomes in a metagenomic sample using NGS reads

AU - Valdes, Camilo

AU - Brennan, Meghan

AU - Clarke, Bertrand

AU - Clarke, Jennifer

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We use a nucleotide flipping technique on whole genome next generation sequencing (NGS) data to test for the presence of various bacterial strains in a single metagenomic sample. Our technique is novel in that we induce artificial point mutations at the nucleotide level to define a test statistic for each genome on a given reference list. After finding a suitable nucleotide flipping rate, we use a variant of the Westfall-Young procedure to correct for multiple comparisons. When we align reads to reference genomes we permit fractional reads i.e., we weight the contribution of each read by one over the number of genomes to which it aligns. In a large scale simulation we characterize our method's performance on 'clean' data with respect to accuracy, genome lengths and genome abundances. Then, we apply our technique to real data from the Human Microbiome Project (HMP). We compare our results based on adjusted p-values with the HMP findings based on abundance, as assessed by coverage. The results from the two methods have substantial overlap; discrepancies can be explained by the inherent variability of the respective processing pipelines and data.

AB - We use a nucleotide flipping technique on whole genome next generation sequencing (NGS) data to test for the presence of various bacterial strains in a single metagenomic sample. Our technique is novel in that we induce artificial point mutations at the nucleotide level to define a test statistic for each genome on a given reference list. After finding a suitable nucleotide flipping rate, we use a variant of the Westfall-Young procedure to correct for multiple comparisons. When we align reads to reference genomes we permit fractional reads i.e., we weight the contribution of each read by one over the number of genomes to which it aligns. In a large scale simulation we characterize our method's performance on 'clean' data with respect to accuracy, genome lengths and genome abundances. Then, we apply our technique to real data from the Human Microbiome Project (HMP). We compare our results based on adjusted p-values with the HMP findings based on abundance, as assessed by coverage. The results from the two methods have substantial overlap; discrepancies can be explained by the inherent variability of the respective processing pipelines and data.

KW - Artificial point mutations

KW - Human microbiome project

KW - Metagenomics

KW - Multiple comparisons

KW - Next generation sequencing

KW - Nucleotide flipping

UR - http://www.scopus.com/inward/record.url?scp=84945269100&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84945269100&partnerID=8YFLogxK

U2 - 10.4310/SII.2015.v8.n4.a7

DO - 10.4310/SII.2015.v8.n4.a7

M3 - Article

AN - SCOPUS:84945269100

VL - 8

SP - 477

EP - 494

JO - Statistics and its Interface

JF - Statistics and its Interface

SN - 1938-7989

IS - 4

ER -