A new approach for gene annotation using unambiguous sequence joining.

Alexandre Tchourbanov, Daniel Quest, Hesham Ali, Mark Pauley, Robert Norgren

Research output: Contribution to journalArticle

Abstract

The problem addressed by this paper is accurate and automatic gene annotation following precise identification/ annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both non-canonic splice sites and short exons. The approach has been tested on the Genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.

Original languageEnglish (US)
Pages (from-to)353-362
Number of pages10
JournalProceedings / IEEE Computer Society Bioinformatics Conference. IEEE Computer Society Bioinformatics Conference.
Volume2
StatePublished - 2003

Fingerprint

Molecular Sequence Annotation
Exons
Sequence Alignment
DNA
Introns
Complementary DNA
Learning
Messenger RNA

Cite this

@article{fd1550c65e924a2ba83f013e5ce8334f,
title = "A new approach for gene annotation using unambiguous sequence joining.",
abstract = "The problem addressed by this paper is accurate and automatic gene annotation following precise identification/ annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both non-canonic splice sites and short exons. The approach has been tested on the Genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97{\%}.",
author = "Alexandre Tchourbanov and Daniel Quest and Hesham Ali and Mark Pauley and Robert Norgren",
year = "2003",
language = "English (US)",
volume = "2",
pages = "353--362",
journal = "Proceedings / IEEE Computer Society Bioinformatics Conference. IEEE Computer Society Bioinformatics Conference.",
issn = "1555-3930",

}

TY - JOUR

T1 - A new approach for gene annotation using unambiguous sequence joining.

AU - Tchourbanov, Alexandre

AU - Quest, Daniel

AU - Ali, Hesham

AU - Pauley, Mark

AU - Norgren, Robert

PY - 2003

Y1 - 2003

N2 - The problem addressed by this paper is accurate and automatic gene annotation following precise identification/ annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both non-canonic splice sites and short exons. The approach has been tested on the Genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.

AB - The problem addressed by this paper is accurate and automatic gene annotation following precise identification/ annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both non-canonic splice sites and short exons. The approach has been tested on the Genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.

UR - http://www.scopus.com/inward/record.url?scp=14044260804&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14044260804&partnerID=8YFLogxK

M3 - Article

C2 - 16452811

VL - 2

SP - 353

EP - 362

JO - Proceedings / IEEE Computer Society Bioinformatics Conference. IEEE Computer Society Bioinformatics Conference.

JF - Proceedings / IEEE Computer Society Bioinformatics Conference. IEEE Computer Society Bioinformatics Conference.

SN - 1555-3930

ER -