Predicting yeast gene function based on hidden markov models

Xutao Deng, Huimin Geng, Hesham H Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The prediction of function classes for unannotated genes or Open Reading Frames (ORFs) is important for understanding the function role of genes and gene networks. Existing data mining tools, such as Support Vector Machines (SVMs) and K-Nearest Neighbors (KNNs), can only achieve about 40% precision. We developed a gene function prediction tool based on profile Hidden Markov Models (HMMs). HMMs have shown great successes in modeling noisy sequential data sets in speech recognition and protein sequence profiling. Results from contingency test showed significant Markov dependency in time-series expression data, and therefore HMMs would be especially appropriate for modeling gene expressions. Each function class is associated with a distinct HMM whose parameters are trained using yeast time-series gene expression data. The function annotations of the HMM training set were obtained from the Munich Information Centre for Protein Sequences (MIPS) data base. We designed two structural variants of HMMs (chain HMM, split HMM) and tested each of them on 40 function classes. The highest overall prediction precision achieved was 67% using double-split HMM with n-fold cross-validation. We also attempted to generalize HMMs to Dynamic Bayesian Networks (DBNs) for gene function prediction using heterogeneous data sets.

Original languageEnglish (US)
Title of host publication20th International Conference on Computers and Their Applications 2005, CATA 2005
Pages196-201
Number of pages6
StatePublished - Dec 1 2005
Event20th International Conference on Computers and Their Applications 2005, CATA 2005 - New Orleans, LA, United States
Duration: Mar 16 2005Mar 18 2005

Publication series

Name20th International Conference on Computers and Their Applications 2005, CATA 2005

Conference

Conference20th International Conference on Computers and Their Applications 2005, CATA 2005
CountryUnited States
CityNew Orleans, LA
Period3/16/053/18/05

Fingerprint

Hidden Markov models
Yeast
Genes
Gene expression
Time series
Proteins
Information services
Bayesian networks
Speech recognition
Support vector machines
Data mining

Keywords

  • Function prediction
  • Gene expression
  • Hidden markov model

ASJC Scopus subject areas

  • Computer Science Applications

Cite this

Deng, X., Geng, H., & Ali, H. H. (2005). Predicting yeast gene function based on hidden markov models. In 20th International Conference on Computers and Their Applications 2005, CATA 2005 (pp. 196-201). (20th International Conference on Computers and Their Applications 2005, CATA 2005).

Predicting yeast gene function based on hidden markov models. / Deng, Xutao; Geng, Huimin; Ali, Hesham H.

20th International Conference on Computers and Their Applications 2005, CATA 2005. 2005. p. 196-201 (20th International Conference on Computers and Their Applications 2005, CATA 2005).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Deng, X, Geng, H & Ali, HH 2005, Predicting yeast gene function based on hidden markov models. in 20th International Conference on Computers and Their Applications 2005, CATA 2005. 20th International Conference on Computers and Their Applications 2005, CATA 2005, pp. 196-201, 20th International Conference on Computers and Their Applications 2005, CATA 2005, New Orleans, LA, United States, 3/16/05.
Deng X, Geng H, Ali HH. Predicting yeast gene function based on hidden markov models. In 20th International Conference on Computers and Their Applications 2005, CATA 2005. 2005. p. 196-201. (20th International Conference on Computers and Their Applications 2005, CATA 2005).
Deng, Xutao ; Geng, Huimin ; Ali, Hesham H. / Predicting yeast gene function based on hidden markov models. 20th International Conference on Computers and Their Applications 2005, CATA 2005. 2005. pp. 196-201 (20th International Conference on Computers and Their Applications 2005, CATA 2005).
@inproceedings{567c4ca6b330467c96c024bcd7fb7935,
title = "Predicting yeast gene function based on hidden markov models",
abstract = "The prediction of function classes for unannotated genes or Open Reading Frames (ORFs) is important for understanding the function role of genes and gene networks. Existing data mining tools, such as Support Vector Machines (SVMs) and K-Nearest Neighbors (KNNs), can only achieve about 40{\%} precision. We developed a gene function prediction tool based on profile Hidden Markov Models (HMMs). HMMs have shown great successes in modeling noisy sequential data sets in speech recognition and protein sequence profiling. Results from contingency test showed significant Markov dependency in time-series expression data, and therefore HMMs would be especially appropriate for modeling gene expressions. Each function class is associated with a distinct HMM whose parameters are trained using yeast time-series gene expression data. The function annotations of the HMM training set were obtained from the Munich Information Centre for Protein Sequences (MIPS) data base. We designed two structural variants of HMMs (chain HMM, split HMM) and tested each of them on 40 function classes. The highest overall prediction precision achieved was 67{\%} using double-split HMM with n-fold cross-validation. We also attempted to generalize HMMs to Dynamic Bayesian Networks (DBNs) for gene function prediction using heterogeneous data sets.",
keywords = "Function prediction, Gene expression, Hidden markov model",
author = "Xutao Deng and Huimin Geng and Ali, {Hesham H}",
year = "2005",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781618395528",
series = "20th International Conference on Computers and Their Applications 2005, CATA 2005",
pages = "196--201",
booktitle = "20th International Conference on Computers and Their Applications 2005, CATA 2005",

}

TY - GEN

T1 - Predicting yeast gene function based on hidden markov models

AU - Deng, Xutao

AU - Geng, Huimin

AU - Ali, Hesham H

PY - 2005/12/1

Y1 - 2005/12/1

N2 - The prediction of function classes for unannotated genes or Open Reading Frames (ORFs) is important for understanding the function role of genes and gene networks. Existing data mining tools, such as Support Vector Machines (SVMs) and K-Nearest Neighbors (KNNs), can only achieve about 40% precision. We developed a gene function prediction tool based on profile Hidden Markov Models (HMMs). HMMs have shown great successes in modeling noisy sequential data sets in speech recognition and protein sequence profiling. Results from contingency test showed significant Markov dependency in time-series expression data, and therefore HMMs would be especially appropriate for modeling gene expressions. Each function class is associated with a distinct HMM whose parameters are trained using yeast time-series gene expression data. The function annotations of the HMM training set were obtained from the Munich Information Centre for Protein Sequences (MIPS) data base. We designed two structural variants of HMMs (chain HMM, split HMM) and tested each of them on 40 function classes. The highest overall prediction precision achieved was 67% using double-split HMM with n-fold cross-validation. We also attempted to generalize HMMs to Dynamic Bayesian Networks (DBNs) for gene function prediction using heterogeneous data sets.

AB - The prediction of function classes for unannotated genes or Open Reading Frames (ORFs) is important for understanding the function role of genes and gene networks. Existing data mining tools, such as Support Vector Machines (SVMs) and K-Nearest Neighbors (KNNs), can only achieve about 40% precision. We developed a gene function prediction tool based on profile Hidden Markov Models (HMMs). HMMs have shown great successes in modeling noisy sequential data sets in speech recognition and protein sequence profiling. Results from contingency test showed significant Markov dependency in time-series expression data, and therefore HMMs would be especially appropriate for modeling gene expressions. Each function class is associated with a distinct HMM whose parameters are trained using yeast time-series gene expression data. The function annotations of the HMM training set were obtained from the Munich Information Centre for Protein Sequences (MIPS) data base. We designed two structural variants of HMMs (chain HMM, split HMM) and tested each of them on 40 function classes. The highest overall prediction precision achieved was 67% using double-split HMM with n-fold cross-validation. We also attempted to generalize HMMs to Dynamic Bayesian Networks (DBNs) for gene function prediction using heterogeneous data sets.

KW - Function prediction

KW - Gene expression

KW - Hidden markov model

UR - http://www.scopus.com/inward/record.url?scp=84870004805&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870004805&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781618395528

T3 - 20th International Conference on Computers and Their Applications 2005, CATA 2005

SP - 196

EP - 201

BT - 20th International Conference on Computers and Their Applications 2005, CATA 2005

ER -