Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has enabled discovery of genomic regions enriched with biological signals such as transcription factor binding and histone modifications. Allelic-imbalance (ALI) detection is a complementary analysis of ChIP-seq data for associating biological signals with single nucleotide polymorphisms (SNPs). It has been successfully used in elucidating functional roles of non-coding SNPs. Commonly used statistical approaches for ALI detection are often based on binomial testing and mixture models, both of which rely on strong assumptions on the distribution of the unobserved allelic probability, and have significant practical shortcomings.We propose Non-Parametric Binomial (NPBin) test for ALI detection and for modeling Binomial data in general. NPBin models the density of the unobserved allelic probability non-parametrically, and estimates its empirical null distribution via curve fitting.We demonstrate the advantages of NPBin in terms of interpretability of the estimated density and the accuracy in ALI detection using simulations and analysis of several ChIP-seq data sets.We also illustrate the generality of our modeling framework beyondALI detection by an application to a baseball batting average prediction problem. This article has supplementary material available at Biostatistics online. The code and the sample input data have been also deposited to github https://github.com/QiZhangStat/ALIdetection.
- Empirical Bayes
- Non-parametric density estimation
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty