An empirical Bayes test for allelic-imbalance detection in ChIP-seq

Qi Zhang, Sündüz Keleş

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has enabled discovery of genomic regions enriched with biological signals such as transcription factor binding and histone modifications. Allelic-imbalance (ALI) detection is a complementary analysis of ChIP-seq data for associating biological signals with single nucleotide polymorphisms (SNPs). It has been successfully used in elucidating functional roles of non-coding SNPs. Commonly used statistical approaches for ALI detection are often based on binomial testing and mixture models, both of which rely on strong assumptions on the distribution of the unobserved allelic probability, and have significant practical shortcomings.We propose Non-Parametric Binomial (NPBin) test for ALI detection and for modeling Binomial data in general. NPBin models the density of the unobserved allelic probability non-parametrically, and estimates its empirical null distribution via curve fitting.We demonstrate the advantages of NPBin in terms of interpretability of the estimated density and the accuracy in ALI detection using simulations and analysis of several ChIP-seq data sets.We also illustrate the generality of our modeling framework beyondALI detection by an application to a baseball batting average prediction problem. This article has supplementary material available at Biostatistics online. The code and the sample input data have been also deposited to github https://github.com/QiZhangStat/ALIdetection.

Original languageEnglish (US)
Pages (from-to)546-561
Number of pages16
JournalBiostatistics
Volume19
Issue number4
DOIs
StatePublished - Oct 1 2018

Fingerprint

Empirical Bayes
Chip
Single nucleotide Polymorphism
Biostatistics
Binomial Model
Empirical Distribution
Curve fitting
Interpretability
Chromatin
Nonparametric Model
Null Distribution
Transcription Factor
Mixture Model
Modeling
Sequencing
High Throughput
Genomics
Imbalance
Testing
Prediction

Keywords

  • Allelic-imbalance
  • ChIP-seq
  • Empirical Bayes
  • Non-parametric density estimation
  • Spline.

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

An empirical Bayes test for allelic-imbalance detection in ChIP-seq. / Zhang, Qi; Keleş, Sündüz.

In: Biostatistics, Vol. 19, No. 4, 01.10.2018, p. 546-561.

Research output: Contribution to journalArticle

Zhang, Qi ; Keleş, Sündüz. / An empirical Bayes test for allelic-imbalance detection in ChIP-seq. In: Biostatistics. 2018 ; Vol. 19, No. 4. pp. 546-561.
@article{dc1fbe2d221540e88acdea93e722292f,
title = "An empirical Bayes test for allelic-imbalance detection in ChIP-seq",
abstract = "Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has enabled discovery of genomic regions enriched with biological signals such as transcription factor binding and histone modifications. Allelic-imbalance (ALI) detection is a complementary analysis of ChIP-seq data for associating biological signals with single nucleotide polymorphisms (SNPs). It has been successfully used in elucidating functional roles of non-coding SNPs. Commonly used statistical approaches for ALI detection are often based on binomial testing and mixture models, both of which rely on strong assumptions on the distribution of the unobserved allelic probability, and have significant practical shortcomings.We propose Non-Parametric Binomial (NPBin) test for ALI detection and for modeling Binomial data in general. NPBin models the density of the unobserved allelic probability non-parametrically, and estimates its empirical null distribution via curve fitting.We demonstrate the advantages of NPBin in terms of interpretability of the estimated density and the accuracy in ALI detection using simulations and analysis of several ChIP-seq data sets.We also illustrate the generality of our modeling framework beyondALI detection by an application to a baseball batting average prediction problem. This article has supplementary material available at Biostatistics online. The code and the sample input data have been also deposited to github https://github.com/QiZhangStat/ALIdetection.",
keywords = "Allelic-imbalance, ChIP-seq, Empirical Bayes, Non-parametric density estimation, Spline.",
author = "Qi Zhang and S{\"u}nd{\"u}z Keleş",
year = "2018",
month = "10",
day = "1",
doi = "10.1093/biostatistics/kxx060",
language = "English (US)",
volume = "19",
pages = "546--561",
journal = "Biostatistics",
issn = "1465-4644",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - An empirical Bayes test for allelic-imbalance detection in ChIP-seq

AU - Zhang, Qi

AU - Keleş, Sündüz

PY - 2018/10/1

Y1 - 2018/10/1

N2 - Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has enabled discovery of genomic regions enriched with biological signals such as transcription factor binding and histone modifications. Allelic-imbalance (ALI) detection is a complementary analysis of ChIP-seq data for associating biological signals with single nucleotide polymorphisms (SNPs). It has been successfully used in elucidating functional roles of non-coding SNPs. Commonly used statistical approaches for ALI detection are often based on binomial testing and mixture models, both of which rely on strong assumptions on the distribution of the unobserved allelic probability, and have significant practical shortcomings.We propose Non-Parametric Binomial (NPBin) test for ALI detection and for modeling Binomial data in general. NPBin models the density of the unobserved allelic probability non-parametrically, and estimates its empirical null distribution via curve fitting.We demonstrate the advantages of NPBin in terms of interpretability of the estimated density and the accuracy in ALI detection using simulations and analysis of several ChIP-seq data sets.We also illustrate the generality of our modeling framework beyondALI detection by an application to a baseball batting average prediction problem. This article has supplementary material available at Biostatistics online. The code and the sample input data have been also deposited to github https://github.com/QiZhangStat/ALIdetection.

AB - Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has enabled discovery of genomic regions enriched with biological signals such as transcription factor binding and histone modifications. Allelic-imbalance (ALI) detection is a complementary analysis of ChIP-seq data for associating biological signals with single nucleotide polymorphisms (SNPs). It has been successfully used in elucidating functional roles of non-coding SNPs. Commonly used statistical approaches for ALI detection are often based on binomial testing and mixture models, both of which rely on strong assumptions on the distribution of the unobserved allelic probability, and have significant practical shortcomings.We propose Non-Parametric Binomial (NPBin) test for ALI detection and for modeling Binomial data in general. NPBin models the density of the unobserved allelic probability non-parametrically, and estimates its empirical null distribution via curve fitting.We demonstrate the advantages of NPBin in terms of interpretability of the estimated density and the accuracy in ALI detection using simulations and analysis of several ChIP-seq data sets.We also illustrate the generality of our modeling framework beyondALI detection by an application to a baseball batting average prediction problem. This article has supplementary material available at Biostatistics online. The code and the sample input data have been also deposited to github https://github.com/QiZhangStat/ALIdetection.

KW - Allelic-imbalance

KW - ChIP-seq

KW - Empirical Bayes

KW - Non-parametric density estimation

KW - Spline.

UR - http://www.scopus.com/inward/record.url?scp=85054709551&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054709551&partnerID=8YFLogxK

U2 - 10.1093/biostatistics/kxx060

DO - 10.1093/biostatistics/kxx060

M3 - Article

VL - 19

SP - 546

EP - 561

JO - Biostatistics

JF - Biostatistics

SN - 1465-4644

IS - 4

ER -