Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks

Xutao Deng, Huimin Geng, Hesham H Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We developed a machine learning system for determining gene functions from heterogeneous sources of data sets using a Weighted Naive Bayesian Network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or ORFs (Open Reading Frames) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore many functional links would be missing when only one or two source of data is used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.

Original languageEnglish (US)
Title of host publicationProceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005
Pages25-34
Number of pages10
DOIs
StatePublished - Dec 1 2005
Event2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005 - Stanford, CA, United States
Duration: Aug 8 2005Aug 11 2005

Publication series

NameProceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
Volume2005

Conference

Conference2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
CountryUnited States
CityStanford, CA
Period8/8/058/11/05

Fingerprint

Information Storage and Retrieval
Bayesian networks
Yeast
Genes
Yeasts
Learning
Databases
Learning systems
Molecular Sequence Annotation
Biological Phenomena
Saccharomycetales
Sequence Homology
Open Reading Frames
Cluster Analysis
Saccharomyces cerevisiae
Cell Cycle
Computational methods
Gene expression
Guidelines
Gene Expression

Keywords

  • Bayesian network
  • Gene function prediction
  • Machine learning
  • Yeast

ASJC Scopus subject areas

  • Engineering(all)
  • Medicine(all)

Cite this

Deng, X., Geng, H., & Ali, H. H. (2005). Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks. In Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005 (pp. 25-34). [1498003] (Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005; Vol. 2005). https://doi.org/10.1109/CSB.2005.38

Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks. / Deng, Xutao; Geng, Huimin; Ali, Hesham H.

Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005. 2005. p. 25-34 1498003 (Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005; Vol. 2005).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Deng, X, Geng, H & Ali, HH 2005, Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks. in Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005., 1498003, Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005, vol. 2005, pp. 25-34, 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005, Stanford, CA, United States, 8/8/05. https://doi.org/10.1109/CSB.2005.38
Deng X, Geng H, Ali HH. Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks. In Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005. 2005. p. 25-34. 1498003. (Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005). https://doi.org/10.1109/CSB.2005.38
Deng, Xutao ; Geng, Huimin ; Ali, Hesham H. / Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks. Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005. 2005. pp. 25-34 (Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005).
@inproceedings{b31d7a2995f14528bbe2a0ad2584f806,
title = "Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks",
abstract = "We developed a machine learning system for determining gene functions from heterogeneous sources of data sets using a Weighted Naive Bayesian Network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or ORFs (Open Reading Frames) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore many functional links would be missing when only one or two source of data is used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.",
keywords = "Bayesian network, Gene function prediction, Machine learning, Yeast",
author = "Xutao Deng and Huimin Geng and Ali, {Hesham H}",
year = "2005",
month = "12",
day = "1",
doi = "10.1109/CSB.2005.38",
language = "English (US)",
isbn = "0769523447",
series = "Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005",
pages = "25--34",
booktitle = "Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005",

}

TY - GEN

T1 - Learning yeast gene functions from heterogeneous sources of data using hybrid Weighted Bayesian Networks

AU - Deng, Xutao

AU - Geng, Huimin

AU - Ali, Hesham H

PY - 2005/12/1

Y1 - 2005/12/1

N2 - We developed a machine learning system for determining gene functions from heterogeneous sources of data sets using a Weighted Naive Bayesian Network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or ORFs (Open Reading Frames) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore many functional links would be missing when only one or two source of data is used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.

AB - We developed a machine learning system for determining gene functions from heterogeneous sources of data sets using a Weighted Naive Bayesian Network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or ORFs (Open Reading Frames) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore many functional links would be missing when only one or two source of data is used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.

KW - Bayesian network

KW - Gene function prediction

KW - Machine learning

KW - Yeast

UR - http://www.scopus.com/inward/record.url?scp=33745499777&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745499777&partnerID=8YFLogxK

U2 - 10.1109/CSB.2005.38

DO - 10.1109/CSB.2005.38

M3 - Conference contribution

C2 - 16447959

AN - SCOPUS:33745499777

SN - 0769523447

SN - 9780769523446

T3 - Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005

SP - 25

EP - 34

BT - Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005

ER -