CNV-guided multi-read allocation for ChIP-seq

Qi Zhang, Sündüz Keleş

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

MOTIVATION: In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads.

RESULTS: We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets.

AVAILABILITY AND IMPLEMENTATION: Available at http://www.stat.wisc.edu/∼qizhang/

CONTACT: : qizhang@stat.wisc.edu or keles@stat.wisc.edu

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)2860-2867
Number of pages8
JournalBioinformatics (Oxford, England)
Volume30
Issue number20
DOIs
StatePublished - Oct 15 2014

Fingerprint

Chip
Expectation-maximization Algorithm
Sequencing
Nucleic Acid Repetitive Sequences
Chromatin Immunoprecipitation
Chromatin
Bioinformatics
Computational Biology
Data-driven
High Throughput
Count
Genome
Alignment
Coverage
Eliminate
Recovery
Availability
Genes
Throughput
Evaluate

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

CNV-guided multi-read allocation for ChIP-seq. / Zhang, Qi; Keleş, Sündüz.

In: Bioinformatics (Oxford, England), Vol. 30, No. 20, 15.10.2014, p. 2860-2867.

Research output: Contribution to journalArticle

Zhang, Qi ; Keleş, Sündüz. / CNV-guided multi-read allocation for ChIP-seq. In: Bioinformatics (Oxford, England). 2014 ; Vol. 30, No. 20. pp. 2860-2867.
@article{06d4e676061f4012ad308b8ae8456c28,
title = "CNV-guided multi-read allocation for ChIP-seq",
abstract = "MOTIVATION: In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads.RESULTS: We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets.AVAILABILITY AND IMPLEMENTATION: Available at http://www.stat.wisc.edu/∼qizhang/CONTACT: : qizhang@stat.wisc.edu or keles@stat.wisc.eduSUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.",
author = "Qi Zhang and S{\"u}nd{\"u}z Keleş",
year = "2014",
month = "10",
day = "15",
doi = "10.1093/bioinformatics/btu402",
language = "English (US)",
volume = "30",
pages = "2860--2867",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "20",

}

TY - JOUR

T1 - CNV-guided multi-read allocation for ChIP-seq

AU - Zhang, Qi

AU - Keleş, Sündüz

PY - 2014/10/15

Y1 - 2014/10/15

N2 - MOTIVATION: In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads.RESULTS: We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets.AVAILABILITY AND IMPLEMENTATION: Available at http://www.stat.wisc.edu/∼qizhang/CONTACT: : qizhang@stat.wisc.edu or keles@stat.wisc.eduSUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads.RESULTS: We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets.AVAILABILITY AND IMPLEMENTATION: Available at http://www.stat.wisc.edu/∼qizhang/CONTACT: : qizhang@stat.wisc.edu or keles@stat.wisc.eduSUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=84921792636&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921792636&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu402

DO - 10.1093/bioinformatics/btu402

M3 - Article

C2 - 24966364

AN - SCOPUS:84921792636

VL - 30

SP - 2860

EP - 2867

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 20

ER -