Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles

Ruchi Verma, Ajit Tiwari, Sukwinder Kaur, Grish C. Varshney, Gajendra P.S. Raghava

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

Background: Malaria parasite secretes various proteins in infected RBC for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine/drug against malaria. The existing motif-based methods have got limited success due to lack of universal motif in all secretory proteins of malaria parasite. Results: In this study a systematic attempt has been made to develop a general method for predicting secretory proteins of malaria parasite. All models were trained and tested on a non-redundant dataset of 252 secretory and 252 non-secretory proteins. We developed SVM models and achieved maximum MCC 0.72 with 85.65% accuracy and MCC 0.74 with 86.45% accuracy using amino acid and dipeptide composition respectively. SVM models were developed using split-amino acid and split-dipeptide composition and achieved maximum MCC 0.74 with 86.40% accuracy and MCC 0.77 with accuracy 88.22% respectively. In this study, for the first time PSSM profiles obtained from PSI-BLAST, have been used for predicting secretory proteins. We achieved maximum MCC 0.86 with 92.66% accuracy using PSSM based SVM model. All models developed in this study were evaluated using 5-fold cross-validation technique. Conclusion: This study demonstrates that secretory proteins have different residue composition than non-secretory proteins. Thus, it is possible to predict secretory proteins from its residue composition-using machine learning technique. The multiple sequence alignment provides more information than sequence itself. Thus performance of method based on PSSM profile is more accurate than method based on sequence composition. A web server PSEApred has been developed for predicting secretory proteins of malaria parasites, the URL can be found in the Availability and requirements section.

Original languageEnglish (US)
Article number201
JournalBMC bioinformatics
Volume9
DOIs
StatePublished - Apr 16 2008

Fingerprint

Malaria
Erythrocyte
Parasites
Erythrocytes
Proteins
Protein
Chemical analysis
Dipeptides
Amino Acids
Amino acids
Profile
Red Blood Cells
Multiple Sequence Alignment
Model
Vaccines
Sequence Alignment
Vaccine
Web Server
Cross-validation
Learning systems

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles. / Verma, Ruchi; Tiwari, Ajit; Kaur, Sukwinder; Varshney, Grish C.; Raghava, Gajendra P.S.

In: BMC bioinformatics, Vol. 9, 201, 16.04.2008.

Research output: Contribution to journalArticle

Verma, Ruchi ; Tiwari, Ajit ; Kaur, Sukwinder ; Varshney, Grish C. ; Raghava, Gajendra P.S. / Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles. In: BMC bioinformatics. 2008 ; Vol. 9.
@article{4de9bf4cfc19476e93303186267cdb67,
title = "Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles",
abstract = "Background: Malaria parasite secretes various proteins in infected RBC for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine/drug against malaria. The existing motif-based methods have got limited success due to lack of universal motif in all secretory proteins of malaria parasite. Results: In this study a systematic attempt has been made to develop a general method for predicting secretory proteins of malaria parasite. All models were trained and tested on a non-redundant dataset of 252 secretory and 252 non-secretory proteins. We developed SVM models and achieved maximum MCC 0.72 with 85.65{\%} accuracy and MCC 0.74 with 86.45{\%} accuracy using amino acid and dipeptide composition respectively. SVM models were developed using split-amino acid and split-dipeptide composition and achieved maximum MCC 0.74 with 86.40{\%} accuracy and MCC 0.77 with accuracy 88.22{\%} respectively. In this study, for the first time PSSM profiles obtained from PSI-BLAST, have been used for predicting secretory proteins. We achieved maximum MCC 0.86 with 92.66{\%} accuracy using PSSM based SVM model. All models developed in this study were evaluated using 5-fold cross-validation technique. Conclusion: This study demonstrates that secretory proteins have different residue composition than non-secretory proteins. Thus, it is possible to predict secretory proteins from its residue composition-using machine learning technique. The multiple sequence alignment provides more information than sequence itself. Thus performance of method based on PSSM profile is more accurate than method based on sequence composition. A web server PSEApred has been developed for predicting secretory proteins of malaria parasites, the URL can be found in the Availability and requirements section.",
author = "Ruchi Verma and Ajit Tiwari and Sukwinder Kaur and Varshney, {Grish C.} and Raghava, {Gajendra P.S.}",
year = "2008",
month = "4",
day = "16",
doi = "10.1186/1471-2105-9-201",
language = "English (US)",
volume = "9",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles

AU - Verma, Ruchi

AU - Tiwari, Ajit

AU - Kaur, Sukwinder

AU - Varshney, Grish C.

AU - Raghava, Gajendra P.S.

PY - 2008/4/16

Y1 - 2008/4/16

N2 - Background: Malaria parasite secretes various proteins in infected RBC for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine/drug against malaria. The existing motif-based methods have got limited success due to lack of universal motif in all secretory proteins of malaria parasite. Results: In this study a systematic attempt has been made to develop a general method for predicting secretory proteins of malaria parasite. All models were trained and tested on a non-redundant dataset of 252 secretory and 252 non-secretory proteins. We developed SVM models and achieved maximum MCC 0.72 with 85.65% accuracy and MCC 0.74 with 86.45% accuracy using amino acid and dipeptide composition respectively. SVM models were developed using split-amino acid and split-dipeptide composition and achieved maximum MCC 0.74 with 86.40% accuracy and MCC 0.77 with accuracy 88.22% respectively. In this study, for the first time PSSM profiles obtained from PSI-BLAST, have been used for predicting secretory proteins. We achieved maximum MCC 0.86 with 92.66% accuracy using PSSM based SVM model. All models developed in this study were evaluated using 5-fold cross-validation technique. Conclusion: This study demonstrates that secretory proteins have different residue composition than non-secretory proteins. Thus, it is possible to predict secretory proteins from its residue composition-using machine learning technique. The multiple sequence alignment provides more information than sequence itself. Thus performance of method based on PSSM profile is more accurate than method based on sequence composition. A web server PSEApred has been developed for predicting secretory proteins of malaria parasites, the URL can be found in the Availability and requirements section.

AB - Background: Malaria parasite secretes various proteins in infected RBC for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine/drug against malaria. The existing motif-based methods have got limited success due to lack of universal motif in all secretory proteins of malaria parasite. Results: In this study a systematic attempt has been made to develop a general method for predicting secretory proteins of malaria parasite. All models were trained and tested on a non-redundant dataset of 252 secretory and 252 non-secretory proteins. We developed SVM models and achieved maximum MCC 0.72 with 85.65% accuracy and MCC 0.74 with 86.45% accuracy using amino acid and dipeptide composition respectively. SVM models were developed using split-amino acid and split-dipeptide composition and achieved maximum MCC 0.74 with 86.40% accuracy and MCC 0.77 with accuracy 88.22% respectively. In this study, for the first time PSSM profiles obtained from PSI-BLAST, have been used for predicting secretory proteins. We achieved maximum MCC 0.86 with 92.66% accuracy using PSSM based SVM model. All models developed in this study were evaluated using 5-fold cross-validation technique. Conclusion: This study demonstrates that secretory proteins have different residue composition than non-secretory proteins. Thus, it is possible to predict secretory proteins from its residue composition-using machine learning technique. The multiple sequence alignment provides more information than sequence itself. Thus performance of method based on PSSM profile is more accurate than method based on sequence composition. A web server PSEApred has been developed for predicting secretory proteins of malaria parasites, the URL can be found in the Availability and requirements section.

UR - http://www.scopus.com/inward/record.url?scp=42949165918&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42949165918&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-201

DO - 10.1186/1471-2105-9-201

M3 - Article

VL - 9

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 201

ER -