G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree

Jingyi Yang, Jitender S Deogun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Classifying G protein-coupled receptors (GPCRs) is an interesting topic because of the important role of GPCRs in pharmaceutical research. GPCRs have diverse functions and are involved in many biological processes, which makes them an ideal target of novel medicine. The diverse nature of GPCRs results in the lack of overall sequence homolog among the members, making the classification of GPCRs a very challenging task. Various approaches and methods have been applied to this task, such as HMM, decision tree, and SVM. However, their performances are not completely satisfactory. In this paper, we propose a new method to classify GPCRs into different subfamilies. In the proposed method, the probabilistic suffix tree (PST) is used to construct a prediction model for each of the subfamilies. To classify a GPCR protein, we calculate its similarity score against the PST prediction model of each subfamily using the multi-domain local prediction algorithm. The protein is then classified into the subfamily which gives it the highest score. Our method only uses the primary sequence information and is also very efficient. The model construction and prediction process takes very short time. However, it reports the 98.07% and 97.35% overall accuracy on the level I and II subfamily classification in a 2-fold cross validation test respectively. Given the high accuracy and efficiency, our method is a significant improvement on previously reported ones.

Original languageEnglish (US)
Title of host publicationProceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06
Pages490-497
Number of pages8
DOIs
StatePublished - Dec 1 2006
Event3rd Computational Intelligence in Bioinformatics and Computational Biology Symposium, CIBCB - Toronto, ON, Canada
Duration: Sep 28 2006Sep 29 2006

Publication series

NameProceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06

Conference

Conference3rd Computational Intelligence in Bioinformatics and Computational Biology Symposium, CIBCB
CountryCanada
CityToronto, ON
Period9/28/069/29/06

Fingerprint

G Protein
Suffix Tree
Receptor
Proteins
Prediction Model
Classify
Protein
Prediction
Pharmaceuticals
Cross-validation
Decision tree
Medicine
High Efficiency
High Accuracy
Fold
Decision trees
Drug products
Calculate
Target

Keywords

  • GPCR protein classification
  • Multi-domain local prediction
  • Probabilistic suffix tree

ASJC Scopus subject areas

  • Artificial Intelligence
  • Biomedical Engineering
  • Applied Mathematics
  • Computational Mathematics

Cite this

Yang, J., & Deogun, J. S. (2006). G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree. In Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06 (pp. 490-497). [4133212] (Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06). https://doi.org/10.1109/CIBCB.2006.330976

G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree. / Yang, Jingyi; Deogun, Jitender S.

Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06. 2006. p. 490-497 4133212 (Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yang, J & Deogun, JS 2006, G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree. in Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06., 4133212, Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06, pp. 490-497, 3rd Computational Intelligence in Bioinformatics and Computational Biology Symposium, CIBCB, Toronto, ON, Canada, 9/28/06. https://doi.org/10.1109/CIBCB.2006.330976
Yang J, Deogun JS. G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree. In Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06. 2006. p. 490-497. 4133212. (Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06). https://doi.org/10.1109/CIBCB.2006.330976
Yang, Jingyi ; Deogun, Jitender S. / G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree. Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06. 2006. pp. 490-497 (Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06).
@inproceedings{c38936a616434815ad8167eecae42ac7,
title = "G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree",
abstract = "Classifying G protein-coupled receptors (GPCRs) is an interesting topic because of the important role of GPCRs in pharmaceutical research. GPCRs have diverse functions and are involved in many biological processes, which makes them an ideal target of novel medicine. The diverse nature of GPCRs results in the lack of overall sequence homolog among the members, making the classification of GPCRs a very challenging task. Various approaches and methods have been applied to this task, such as HMM, decision tree, and SVM. However, their performances are not completely satisfactory. In this paper, we propose a new method to classify GPCRs into different subfamilies. In the proposed method, the probabilistic suffix tree (PST) is used to construct a prediction model for each of the subfamilies. To classify a GPCR protein, we calculate its similarity score against the PST prediction model of each subfamily using the multi-domain local prediction algorithm. The protein is then classified into the subfamily which gives it the highest score. Our method only uses the primary sequence information and is also very efficient. The model construction and prediction process takes very short time. However, it reports the 98.07{\%} and 97.35{\%} overall accuracy on the level I and II subfamily classification in a 2-fold cross validation test respectively. Given the high accuracy and efficiency, our method is a significant improvement on previously reported ones.",
keywords = "GPCR protein classification, Multi-domain local prediction, Probabilistic suffix tree",
author = "Jingyi Yang and Deogun, {Jitender S}",
year = "2006",
month = "12",
day = "1",
doi = "10.1109/CIBCB.2006.330976",
language = "English (US)",
isbn = "1424406234",
series = "Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06",
pages = "490--497",
booktitle = "Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06",

}

TY - GEN

T1 - G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree

AU - Yang, Jingyi

AU - Deogun, Jitender S

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Classifying G protein-coupled receptors (GPCRs) is an interesting topic because of the important role of GPCRs in pharmaceutical research. GPCRs have diverse functions and are involved in many biological processes, which makes them an ideal target of novel medicine. The diverse nature of GPCRs results in the lack of overall sequence homolog among the members, making the classification of GPCRs a very challenging task. Various approaches and methods have been applied to this task, such as HMM, decision tree, and SVM. However, their performances are not completely satisfactory. In this paper, we propose a new method to classify GPCRs into different subfamilies. In the proposed method, the probabilistic suffix tree (PST) is used to construct a prediction model for each of the subfamilies. To classify a GPCR protein, we calculate its similarity score against the PST prediction model of each subfamily using the multi-domain local prediction algorithm. The protein is then classified into the subfamily which gives it the highest score. Our method only uses the primary sequence information and is also very efficient. The model construction and prediction process takes very short time. However, it reports the 98.07% and 97.35% overall accuracy on the level I and II subfamily classification in a 2-fold cross validation test respectively. Given the high accuracy and efficiency, our method is a significant improvement on previously reported ones.

AB - Classifying G protein-coupled receptors (GPCRs) is an interesting topic because of the important role of GPCRs in pharmaceutical research. GPCRs have diverse functions and are involved in many biological processes, which makes them an ideal target of novel medicine. The diverse nature of GPCRs results in the lack of overall sequence homolog among the members, making the classification of GPCRs a very challenging task. Various approaches and methods have been applied to this task, such as HMM, decision tree, and SVM. However, their performances are not completely satisfactory. In this paper, we propose a new method to classify GPCRs into different subfamilies. In the proposed method, the probabilistic suffix tree (PST) is used to construct a prediction model for each of the subfamilies. To classify a GPCR protein, we calculate its similarity score against the PST prediction model of each subfamily using the multi-domain local prediction algorithm. The protein is then classified into the subfamily which gives it the highest score. Our method only uses the primary sequence information and is also very efficient. The model construction and prediction process takes very short time. However, it reports the 98.07% and 97.35% overall accuracy on the level I and II subfamily classification in a 2-fold cross validation test respectively. Given the high accuracy and efficiency, our method is a significant improvement on previously reported ones.

KW - GPCR protein classification

KW - Multi-domain local prediction

KW - Probabilistic suffix tree

UR - http://www.scopus.com/inward/record.url?scp=50249085915&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50249085915&partnerID=8YFLogxK

U2 - 10.1109/CIBCB.2006.330976

DO - 10.1109/CIBCB.2006.330976

M3 - Conference contribution

SN - 1424406234

SN - 9781424406234

T3 - Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06

SP - 490

EP - 497

BT - Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06

ER -