Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties

Yongchao Dou, Bo Yao, Chi Zhang

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages265-274
Number of pages10
Volume1484
DOIs
StatePublished - 2017

Publication series

NameMethods in Molecular Biology
Volume1484
ISSN (Print)10643745

Fingerprint

Phosphorylation
Proteins
Entropy
Area Under Curve
Secondary Protein Structure
Threonine
Hydrophobic and Hydrophilic Interactions
Serine
Tyrosine
Animal Models
Genome
Amino Acids
Datasets

Keywords

  • Non-kinase-specific tool
  • Phosphorylation site prediction
  • Support vector machine

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Dou, Y., Yao, B., & Zhang, C. (2017). Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. In Methods in Molecular Biology (Vol. 1484, pp. 265-274). (Methods in Molecular Biology; Vol. 1484). Humana Press Inc.. https://doi.org/10.1007/978-1-4939-6406-2_18

Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. / Dou, Yongchao; Yao, Bo; Zhang, Chi.

Methods in Molecular Biology. Vol. 1484 Humana Press Inc., 2017. p. 265-274 (Methods in Molecular Biology; Vol. 1484).

Research output: Chapter in Book/Report/Conference proceedingChapter

Dou, Y, Yao, B & Zhang, C 2017, Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. in Methods in Molecular Biology. vol. 1484, Methods in Molecular Biology, vol. 1484, Humana Press Inc., pp. 265-274. https://doi.org/10.1007/978-1-4939-6406-2_18
Dou Y, Yao B, Zhang C. Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. In Methods in Molecular Biology. Vol. 1484. Humana Press Inc. 2017. p. 265-274. (Methods in Molecular Biology). https://doi.org/10.1007/978-1-4939-6406-2_18
Dou, Yongchao ; Yao, Bo ; Zhang, Chi. / Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. Methods in Molecular Biology. Vol. 1484 Humana Press Inc., 2017. pp. 265-274 (Methods in Molecular Biology).
@inbook{835b96cab6964317a5b91ae5d57f5d6c,
title = "Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties",
abstract = "Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.",
keywords = "Non-kinase-specific tool, Phosphorylation site prediction, Support vector machine",
author = "Yongchao Dou and Bo Yao and Chi Zhang",
year = "2017",
doi = "10.1007/978-1-4939-6406-2_18",
language = "English (US)",
volume = "1484",
series = "Methods in Molecular Biology",
publisher = "Humana Press Inc.",
pages = "265--274",
booktitle = "Methods in Molecular Biology",

}

TY - CHAP

T1 - Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties

AU - Dou, Yongchao

AU - Yao, Bo

AU - Zhang, Chi

PY - 2017

Y1 - 2017

N2 - Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

AB - Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

KW - Non-kinase-specific tool

KW - Phosphorylation site prediction

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=84994291874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994291874&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-6406-2_18

DO - 10.1007/978-1-4939-6406-2_18

M3 - Chapter

VL - 1484

T3 - Methods in Molecular Biology

SP - 265

EP - 274

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -