STAG-CNS: An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species

Xianjun Lai, Sairam Behera, Zhikai Liang, Yanli Lu, Jitender S. Deogun, James C. Schnable

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.

Original languageEnglish (US)
Pages (from-to)990-999
Number of pages10
JournalMolecular Plant
Volume10
Issue number7
DOIs
StatePublished - Jul 5 2017

Fingerprint

Conserved Sequence
promoter regions
deoxyribonuclease I
duplicate genes
genes
conserved sequences
polyploidy
hypersensitivity
chromatin
binding sites
transcription factors
grasses
genome
corn
Duplicate Genes
assays
Genes
Polyploidy
Nucleic Acid Regulatory Sequences
Deoxyribonuclease I

Keywords

  • comparative genomics
  • conserved noncoding sequence
  • grain crops
  • longest path algorithm
  • suffix tree

ASJC Scopus subject areas

  • Molecular Biology
  • Plant Science

Cite this

STAG-CNS : An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species. / Lai, Xianjun; Behera, Sairam; Liang, Zhikai; Lu, Yanli; Deogun, Jitender S.; Schnable, James C.

In: Molecular Plant, Vol. 10, No. 7, 05.07.2017, p. 990-999.

Research output: Contribution to journalArticle

Lai, Xianjun ; Behera, Sairam ; Liang, Zhikai ; Lu, Yanli ; Deogun, Jitender S. ; Schnable, James C. / STAG-CNS : An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species. In: Molecular Plant. 2017 ; Vol. 10, No. 7. pp. 990-999.
@article{edf0fb4667844038b89e60efa2bea32a,
title = "STAG-CNS: An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species",
abstract = "One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.",
keywords = "comparative genomics, conserved noncoding sequence, grain crops, longest path algorithm, suffix tree",
author = "Xianjun Lai and Sairam Behera and Zhikai Liang and Yanli Lu and Deogun, {Jitender S.} and Schnable, {James C.}",
year = "2017",
month = "7",
day = "5",
doi = "10.1016/j.molp.2017.05.010",
language = "English (US)",
volume = "10",
pages = "990--999",
journal = "Molecular Plant",
issn = "1674-2052",
publisher = "Cell Press",
number = "7",

}

TY - JOUR

T1 - STAG-CNS

T2 - An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species

AU - Lai, Xianjun

AU - Behera, Sairam

AU - Liang, Zhikai

AU - Lu, Yanli

AU - Deogun, Jitender S.

AU - Schnable, James C.

PY - 2017/7/5

Y1 - 2017/7/5

N2 - One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.

AB - One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.

KW - comparative genomics

KW - conserved noncoding sequence

KW - grain crops

KW - longest path algorithm

KW - suffix tree

UR - http://www.scopus.com/inward/record.url?scp=85021786222&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021786222&partnerID=8YFLogxK

U2 - 10.1016/j.molp.2017.05.010

DO - 10.1016/j.molp.2017.05.010

M3 - Article

C2 - 28602693

AN - SCOPUS:85021786222

VL - 10

SP - 990

EP - 999

JO - Molecular Plant

JF - Molecular Plant

SN - 1674-2052

IS - 7

ER -