Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties

Juan Cui, Lian Yi Han, Hu Li, Choong Yong Ung, Zhi Qun Tang, Chan Juan Zheng, Zhi Wei Cao, Yu Zong Chen

Research output: Contribution to journalArticle

49 Citations (Scopus)

Abstract

Background: Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. Methods: This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. Results: Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. Conclusions: Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/APPEL.

Original languageEnglish (US)
Pages (from-to)514-520
Number of pages7
JournalMolecular Immunology
Volume44
Issue number4
DOIs
StatePublished - Jan 1 2007

Fingerprint

Allergens
Proteins
Protein Databases
Databases
Software
Learning

Keywords

  • Allergen
  • Immunology
  • Statistical learning method
  • Support vector machine

ASJC Scopus subject areas

  • Immunology
  • Molecular Biology

Cite this

Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. / Cui, Juan; Han, Lian Yi; Li, Hu; Ung, Choong Yong; Tang, Zhi Qun; Zheng, Chan Juan; Cao, Zhi Wei; Chen, Yu Zong.

In: Molecular Immunology, Vol. 44, No. 4, 01.01.2007, p. 514-520.

Research output: Contribution to journalArticle

Cui, Juan ; Han, Lian Yi ; Li, Hu ; Ung, Choong Yong ; Tang, Zhi Qun ; Zheng, Chan Juan ; Cao, Zhi Wei ; Chen, Yu Zong. / Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. In: Molecular Immunology. 2007 ; Vol. 44, No. 4. pp. 514-520.
@article{1183c11e292b463aac6a0e956ef81a4b,
title = "Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties",
abstract = "Background: Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. Methods: This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. Results: Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0{\%} and 99.9{\%} of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9{\%} is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9{\%} of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. Conclusions: Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/APPEL.",
keywords = "Allergen, Immunology, Statistical learning method, Support vector machine",
author = "Juan Cui and Han, {Lian Yi} and Hu Li and Ung, {Choong Yong} and Tang, {Zhi Qun} and Zheng, {Chan Juan} and Cao, {Zhi Wei} and Chen, {Yu Zong}",
year = "2007",
month = "1",
day = "1",
doi = "10.1016/j.molimm.2006.02.010",
language = "English (US)",
volume = "44",
pages = "514--520",
journal = "Molecular Immunology",
issn = "0161-5890",
publisher = "Elsevier Limited",
number = "4",

}

TY - JOUR

T1 - Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties

AU - Cui, Juan

AU - Han, Lian Yi

AU - Li, Hu

AU - Ung, Choong Yong

AU - Tang, Zhi Qun

AU - Zheng, Chan Juan

AU - Cao, Zhi Wei

AU - Chen, Yu Zong

PY - 2007/1/1

Y1 - 2007/1/1

N2 - Background: Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. Methods: This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. Results: Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. Conclusions: Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/APPEL.

AB - Background: Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. Methods: This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. Results: Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. Conclusions: Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/APPEL.

KW - Allergen

KW - Immunology

KW - Statistical learning method

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=33748199236&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748199236&partnerID=8YFLogxK

U2 - 10.1016/j.molimm.2006.02.010

DO - 10.1016/j.molimm.2006.02.010

M3 - Article

C2 - 16563508

AN - SCOPUS:33748199236

VL - 44

SP - 514

EP - 520

JO - Molecular Immunology

JF - Molecular Immunology

SN - 0161-5890

IS - 4

ER -