Annotating nonspecific SAGE tags with microarray data

Xijin Ge, Yong Chul Jung, Qingfa Wu, Warren A. Kibbe, San Ming Wang

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (www.basic.northwerstern.edu/SAGE). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.

Original languageEnglish (US)
Pages (from-to)173-180
Number of pages8
JournalGenomics
Volume87
Issue number1
DOIs
StatePublished - Jan 1 2006

Fingerprint

Gene Expression
Databases
Molecular Sequence Annotation
Microarray Analysis
Genes
Cell Line

Keywords

  • Gene annotation
  • Microarray
  • Nonspecific SAGE tag
  • SAGE

ASJC Scopus subject areas

  • Genetics

Cite this

Ge, X., Jung, Y. C., Wu, Q., Kibbe, W. A., & Wang, S. M. (2006). Annotating nonspecific SAGE tags with microarray data. Genomics, 87(1), 173-180. https://doi.org/10.1016/j.ygeno.2005.08.014

Annotating nonspecific SAGE tags with microarray data. / Ge, Xijin; Jung, Yong Chul; Wu, Qingfa; Kibbe, Warren A.; Wang, San Ming.

In: Genomics, Vol. 87, No. 1, 01.01.2006, p. 173-180.

Research output: Contribution to journalArticle

Ge, X, Jung, YC, Wu, Q, Kibbe, WA & Wang, SM 2006, 'Annotating nonspecific SAGE tags with microarray data', Genomics, vol. 87, no. 1, pp. 173-180. https://doi.org/10.1016/j.ygeno.2005.08.014
Ge X, Jung YC, Wu Q, Kibbe WA, Wang SM. Annotating nonspecific SAGE tags with microarray data. Genomics. 2006 Jan 1;87(1):173-180. https://doi.org/10.1016/j.ygeno.2005.08.014
Ge, Xijin ; Jung, Yong Chul ; Wu, Qingfa ; Kibbe, Warren A. ; Wang, San Ming. / Annotating nonspecific SAGE tags with microarray data. In: Genomics. 2006 ; Vol. 87, No. 1. pp. 173-180.
@article{f498c9546ce4404d8748304fa041f11a,
title = "Annotating nonspecific SAGE tags with microarray data",
abstract = "SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (www.basic.northwerstern.edu/SAGE). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.",
keywords = "Gene annotation, Microarray, Nonspecific SAGE tag, SAGE",
author = "Xijin Ge and Jung, {Yong Chul} and Qingfa Wu and Kibbe, {Warren A.} and Wang, {San Ming}",
year = "2006",
month = "1",
day = "1",
doi = "10.1016/j.ygeno.2005.08.014",
language = "English (US)",
volume = "87",
pages = "173--180",
journal = "Genomics",
issn = "0888-7543",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Annotating nonspecific SAGE tags with microarray data

AU - Ge, Xijin

AU - Jung, Yong Chul

AU - Wu, Qingfa

AU - Kibbe, Warren A.

AU - Wang, San Ming

PY - 2006/1/1

Y1 - 2006/1/1

N2 - SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (www.basic.northwerstern.edu/SAGE). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.

AB - SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (www.basic.northwerstern.edu/SAGE). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.

KW - Gene annotation

KW - Microarray

KW - Nonspecific SAGE tag

KW - SAGE

UR - http://www.scopus.com/inward/record.url?scp=29344464600&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=29344464600&partnerID=8YFLogxK

U2 - 10.1016/j.ygeno.2005.08.014

DO - 10.1016/j.ygeno.2005.08.014

M3 - Article

C2 - 16314072

AN - SCOPUS:29344464600

VL - 87

SP - 173

EP - 180

JO - Genomics

JF - Genomics

SN - 0888-7543

IS - 1

ER -