L1pred

A sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier

Yongchao Dou, Jun Wang, Jialiang Yang, Chi Zhang

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.

Original languageEnglish (US)
Article numbere35666
JournalPloS one
Volume7
Issue number4
DOIs
StatePublished - Apr 27 2012

Fingerprint

Classifiers
prediction
Enzymes
Computational methods
enzymes
ROC Curve
protein engineering
Area Under Curve
Protein Engineering
Drug Design
methodology
Pharmaceutical Preparations
Costs
drugs
Proteins
Costs and Cost Analysis
Datasets
testing

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

L1pred : A sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. / Dou, Yongchao; Wang, Jun; Yang, Jialiang; Zhang, Chi.

In: PloS one, Vol. 7, No. 4, e35666, 27.04.2012.

Research output: Contribution to journalArticle

@article{7e76e554821c40d3864589bb6e6dcf40,
title = "L1pred: A sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier",
abstract = "To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.",
author = "Yongchao Dou and Jun Wang and Jialiang Yang and Chi Zhang",
year = "2012",
month = "4",
day = "27",
doi = "10.1371/journal.pone.0035666",
language = "English (US)",
volume = "7",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "4",

}

TY - JOUR

T1 - L1pred

T2 - A sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier

AU - Dou, Yongchao

AU - Wang, Jun

AU - Yang, Jialiang

AU - Zhang, Chi

PY - 2012/4/27

Y1 - 2012/4/27

N2 - To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.

AB - To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.

UR - http://www.scopus.com/inward/record.url?scp=84860487333&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860487333&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0035666

DO - 10.1371/journal.pone.0035666

M3 - Article

VL - 7

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 4

M1 - e35666

ER -