A new approach for gene prediction using comparative sequence analysis

Rong Chen, Hesham H Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. In this work, a comparative analysis is conducted on homologous genomic sequences of organisms with different evolutionary distances and the conservation of the noncoding regions between closely related organisms is found. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. This study sought to illuminate the impact of evolutionary distances on the performance of the proposed gene-finding program based on the cross-species sequence comparison. Base on the finding from comparative study and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM Symposium on Applied Computing
Pages177-184
Number of pages8
Volume1
StatePublished - 2005
Event20th Annual ACM Symposium on Applied Computing - Santa Fe, NM
Duration: Mar 13 2005Mar 17 2005

Other

Other20th Annual ACM Symposium on Applied Computing
CitySanta Fe, NM
Period3/13/053/17/05

Fingerprint

Conservation
Genes
DNA
Availability
Proteins
Genomics
Statistical Models

Keywords

  • Coding and non-coding regions
  • Comparative genomics
  • Gene prediction
  • Multiple species

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Chen, R., & Ali, H. H. (2005). A new approach for gene prediction using comparative sequence analysis. In Proceedings of the ACM Symposium on Applied Computing (Vol. 1, pp. 177-184)

A new approach for gene prediction using comparative sequence analysis. / Chen, Rong; Ali, Hesham H.

Proceedings of the ACM Symposium on Applied Computing. Vol. 1 2005. p. 177-184.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, R & Ali, HH 2005, A new approach for gene prediction using comparative sequence analysis. in Proceedings of the ACM Symposium on Applied Computing. vol. 1, pp. 177-184, 20th Annual ACM Symposium on Applied Computing, Santa Fe, NM, 3/13/05.
Chen R, Ali HH. A new approach for gene prediction using comparative sequence analysis. In Proceedings of the ACM Symposium on Applied Computing. Vol. 1. 2005. p. 177-184
Chen, Rong ; Ali, Hesham H. / A new approach for gene prediction using comparative sequence analysis. Proceedings of the ACM Symposium on Applied Computing. Vol. 1 2005. pp. 177-184
@inproceedings{401b1293cbe04f2d94a4205d44272c04,
title = "A new approach for gene prediction using comparative sequence analysis",
abstract = "The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. In this work, a comparative analysis is conducted on homologous genomic sequences of organisms with different evolutionary distances and the conservation of the noncoding regions between closely related organisms is found. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. This study sought to illuminate the impact of evolutionary distances on the performance of the proposed gene-finding program based on the cross-species sequence comparison. Base on the finding from comparative study and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.",
keywords = "Coding and non-coding regions, Comparative genomics, Gene prediction, Multiple species",
author = "Rong Chen and Ali, {Hesham H}",
year = "2005",
language = "English (US)",
volume = "1",
pages = "177--184",
booktitle = "Proceedings of the ACM Symposium on Applied Computing",

}

TY - GEN

T1 - A new approach for gene prediction using comparative sequence analysis

AU - Chen, Rong

AU - Ali, Hesham H

PY - 2005

Y1 - 2005

N2 - The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. In this work, a comparative analysis is conducted on homologous genomic sequences of organisms with different evolutionary distances and the conservation of the noncoding regions between closely related organisms is found. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. This study sought to illuminate the impact of evolutionary distances on the performance of the proposed gene-finding program based on the cross-species sequence comparison. Base on the finding from comparative study and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.

AB - The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. In this work, a comparative analysis is conducted on homologous genomic sequences of organisms with different evolutionary distances and the conservation of the noncoding regions between closely related organisms is found. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. This study sought to illuminate the impact of evolutionary distances on the performance of the proposed gene-finding program based on the cross-species sequence comparison. Base on the finding from comparative study and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.

KW - Coding and non-coding regions

KW - Comparative genomics

KW - Gene prediction

KW - Multiple species

UR - http://www.scopus.com/inward/record.url?scp=33644526899&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33644526899&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33644526899

VL - 1

SP - 177

EP - 184

BT - Proceedings of the ACM Symposium on Applied Computing

ER -