Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties

J. Cui, L. Y. Han, H. H. Lin, H. L. Zhang, Z. Q. Tang, C. J. Zheng, Z. W. Cao, Y. Z. Chen

Research output: Contribution to journalArticle

54 Citations (Scopus)

Abstract

Peptide binding to MHC is critical for antigen recognition by T-cells. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides, which achieve impressive prediction accuracies of 70-90% for binders and 40-80% for non-binders. These methods have been developed for peptides of fixed lengths, for a limited number of alleles, trained from small number of non-binders, and in some cases based straightforwardly on sequence. These limit prediction coverage and accuracy particularly for non-binders. It is desirable to explore methods that predict binders of flexible lengths from sequence-derived physicochemical properties and trained from diverse sets of non-binders. This work explores support vector machines (SVM) as such a method for developing prediction systems of 18 MHC class I and 12 class II alleles by using 4208-3252 binders and 234,333-168,793 non-binders, and evaluated by an independent set of 545-476 binders and 110,564-84,430 non-binders. Binder accuracies are 86-99% for 25 and 70-80% for 5 alleles, non-binder accuracies are 96-99% for 30 alleles. Binder accuracies are comparable and non-binder accuracies substantially improved against other results. Our method correctly predicts 73.3% of the 15 newly-published epitopes in the last 4 months of 2005. Of the 251 recently-published HLA-A*0201 non-epitopes predicted as binders by other methods, 63 are predicted as binders by our method. Screening of HIV-1 genome shows that, compared to other methods, a comparable percentage (75-100%) of its known epitopes is correctly predicted, while a lower percentage (0.01-5% for 24 and 5-8% for 6 alleles) of its constituent peptides are predicted as binders. Our software can be accessed at http://bidd.cz3.nus.edu.sg/mhc/.

Original languageEnglish (US)
Pages (from-to)866-877
Number of pages12
JournalMolecular Immunology
Volume44
Issue number5
DOIs
StatePublished - Feb 1 2007

Fingerprint

Alleles
Epitopes
Peptides
MHC binding peptide
HIV-1
Software
Vaccines
Genome
T-Lymphocytes
Antigens

Keywords

  • Epitopes
  • MHC binding peptide
  • SVM
  • Vaccine

ASJC Scopus subject areas

  • Immunology
  • Molecular Biology

Cite this

Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties. / Cui, J.; Han, L. Y.; Lin, H. H.; Zhang, H. L.; Tang, Z. Q.; Zheng, C. J.; Cao, Z. W.; Chen, Y. Z.

In: Molecular Immunology, Vol. 44, No. 5, 01.02.2007, p. 866-877.

Research output: Contribution to journalArticle

Cui, J. ; Han, L. Y. ; Lin, H. H. ; Zhang, H. L. ; Tang, Z. Q. ; Zheng, C. J. ; Cao, Z. W. ; Chen, Y. Z. / Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties. In: Molecular Immunology. 2007 ; Vol. 44, No. 5. pp. 866-877.
@article{a4a09e87a41d4cada6ecb1fb3e8e7e41,
title = "Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties",
abstract = "Peptide binding to MHC is critical for antigen recognition by T-cells. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides, which achieve impressive prediction accuracies of 70-90{\%} for binders and 40-80{\%} for non-binders. These methods have been developed for peptides of fixed lengths, for a limited number of alleles, trained from small number of non-binders, and in some cases based straightforwardly on sequence. These limit prediction coverage and accuracy particularly for non-binders. It is desirable to explore methods that predict binders of flexible lengths from sequence-derived physicochemical properties and trained from diverse sets of non-binders. This work explores support vector machines (SVM) as such a method for developing prediction systems of 18 MHC class I and 12 class II alleles by using 4208-3252 binders and 234,333-168,793 non-binders, and evaluated by an independent set of 545-476 binders and 110,564-84,430 non-binders. Binder accuracies are 86-99{\%} for 25 and 70-80{\%} for 5 alleles, non-binder accuracies are 96-99{\%} for 30 alleles. Binder accuracies are comparable and non-binder accuracies substantially improved against other results. Our method correctly predicts 73.3{\%} of the 15 newly-published epitopes in the last 4 months of 2005. Of the 251 recently-published HLA-A*0201 non-epitopes predicted as binders by other methods, 63 are predicted as binders by our method. Screening of HIV-1 genome shows that, compared to other methods, a comparable percentage (75-100{\%}) of its known epitopes is correctly predicted, while a lower percentage (0.01-5{\%} for 24 and 5-8{\%} for 6 alleles) of its constituent peptides are predicted as binders. Our software can be accessed at http://bidd.cz3.nus.edu.sg/mhc/.",
keywords = "Epitopes, MHC binding peptide, SVM, Vaccine",
author = "J. Cui and Han, {L. Y.} and Lin, {H. H.} and Zhang, {H. L.} and Tang, {Z. Q.} and Zheng, {C. J.} and Cao, {Z. W.} and Chen, {Y. Z.}",
year = "2007",
month = "2",
day = "1",
doi = "10.1016/j.molimm.2006.04.001",
language = "English (US)",
volume = "44",
pages = "866--877",
journal = "Molecular Immunology",
issn = "0161-5890",
publisher = "Elsevier Limited",
number = "5",

}

TY - JOUR

T1 - Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties

AU - Cui, J.

AU - Han, L. Y.

AU - Lin, H. H.

AU - Zhang, H. L.

AU - Tang, Z. Q.

AU - Zheng, C. J.

AU - Cao, Z. W.

AU - Chen, Y. Z.

PY - 2007/2/1

Y1 - 2007/2/1

N2 - Peptide binding to MHC is critical for antigen recognition by T-cells. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides, which achieve impressive prediction accuracies of 70-90% for binders and 40-80% for non-binders. These methods have been developed for peptides of fixed lengths, for a limited number of alleles, trained from small number of non-binders, and in some cases based straightforwardly on sequence. These limit prediction coverage and accuracy particularly for non-binders. It is desirable to explore methods that predict binders of flexible lengths from sequence-derived physicochemical properties and trained from diverse sets of non-binders. This work explores support vector machines (SVM) as such a method for developing prediction systems of 18 MHC class I and 12 class II alleles by using 4208-3252 binders and 234,333-168,793 non-binders, and evaluated by an independent set of 545-476 binders and 110,564-84,430 non-binders. Binder accuracies are 86-99% for 25 and 70-80% for 5 alleles, non-binder accuracies are 96-99% for 30 alleles. Binder accuracies are comparable and non-binder accuracies substantially improved against other results. Our method correctly predicts 73.3% of the 15 newly-published epitopes in the last 4 months of 2005. Of the 251 recently-published HLA-A*0201 non-epitopes predicted as binders by other methods, 63 are predicted as binders by our method. Screening of HIV-1 genome shows that, compared to other methods, a comparable percentage (75-100%) of its known epitopes is correctly predicted, while a lower percentage (0.01-5% for 24 and 5-8% for 6 alleles) of its constituent peptides are predicted as binders. Our software can be accessed at http://bidd.cz3.nus.edu.sg/mhc/.

AB - Peptide binding to MHC is critical for antigen recognition by T-cells. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides, which achieve impressive prediction accuracies of 70-90% for binders and 40-80% for non-binders. These methods have been developed for peptides of fixed lengths, for a limited number of alleles, trained from small number of non-binders, and in some cases based straightforwardly on sequence. These limit prediction coverage and accuracy particularly for non-binders. It is desirable to explore methods that predict binders of flexible lengths from sequence-derived physicochemical properties and trained from diverse sets of non-binders. This work explores support vector machines (SVM) as such a method for developing prediction systems of 18 MHC class I and 12 class II alleles by using 4208-3252 binders and 234,333-168,793 non-binders, and evaluated by an independent set of 545-476 binders and 110,564-84,430 non-binders. Binder accuracies are 86-99% for 25 and 70-80% for 5 alleles, non-binder accuracies are 96-99% for 30 alleles. Binder accuracies are comparable and non-binder accuracies substantially improved against other results. Our method correctly predicts 73.3% of the 15 newly-published epitopes in the last 4 months of 2005. Of the 251 recently-published HLA-A*0201 non-epitopes predicted as binders by other methods, 63 are predicted as binders by our method. Screening of HIV-1 genome shows that, compared to other methods, a comparable percentage (75-100%) of its known epitopes is correctly predicted, while a lower percentage (0.01-5% for 24 and 5-8% for 6 alleles) of its constituent peptides are predicted as binders. Our software can be accessed at http://bidd.cz3.nus.edu.sg/mhc/.

KW - Epitopes

KW - MHC binding peptide

KW - SVM

KW - Vaccine

UR - http://www.scopus.com/inward/record.url?scp=33748792816&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748792816&partnerID=8YFLogxK

U2 - 10.1016/j.molimm.2006.04.001

DO - 10.1016/j.molimm.2006.04.001

M3 - Article

C2 - 16806474

AN - SCOPUS:33748792816

VL - 44

SP - 866

EP - 877

JO - Molecular Immunology

JF - Molecular Immunology

SN - 0161-5890

IS - 5

ER -