Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties

Yongchao Dou, Bo Yao, Chi Zhang

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages265-274
Number of pages10
DOIs
StatePublished - Jan 1 2017

Publication series

NameMethods in Molecular Biology
Volume1484
ISSN (Print)1064-3745

Fingerprint

Phosphorylation
Proteins
Entropy
Area Under Curve
Secondary Protein Structure
Threonine
Hydrophobic and Hydrophilic Interactions
Serine
Tyrosine
Animal Models
Genome
Amino Acids
Datasets

Keywords

  • Non-kinase-specific tool
  • Phosphorylation site prediction
  • Support vector machine

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Dou, Y., Yao, B., & Zhang, C. (2017). Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. In Methods in Molecular Biology (pp. 265-274). (Methods in Molecular Biology; Vol. 1484). Humana Press Inc.. https://doi.org/10.1007/978-1-4939-6406-2_18

Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. / Dou, Yongchao; Yao, Bo; Zhang, Chi.

Methods in Molecular Biology. Humana Press Inc., 2017. p. 265-274 (Methods in Molecular Biology; Vol. 1484).

Research output: Chapter in Book/Report/Conference proceedingChapter

Dou, Y, Yao, B & Zhang, C 2017, Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. in Methods in Molecular Biology. Methods in Molecular Biology, vol. 1484, Humana Press Inc., pp. 265-274. https://doi.org/10.1007/978-1-4939-6406-2_18
Dou, Yongchao ; Yao, Bo ; Zhang, Chi. / Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties. Methods in Molecular Biology. Humana Press Inc., 2017. pp. 265-274 (Methods in Molecular Biology).
@inbook{835b96cab6964317a5b91ae5d57f5d6c,
title = "Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties",
abstract = "Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.",
keywords = "Non-kinase-specific tool, Phosphorylation site prediction, Support vector machine",
author = "Yongchao Dou and Bo Yao and Chi Zhang",
year = "2017",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-6406-2_18",
language = "English (US)",
series = "Methods in Molecular Biology",
publisher = "Humana Press Inc.",
pages = "265--274",
booktitle = "Methods in Molecular Biology",

}

TY - CHAP

T1 - Prediction of protein phosphorylation sites by integrating secondary structure information and other one-dimensional structural properties

AU - Dou, Yongchao

AU - Yao, Bo

AU - Zhang, Chi

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

AB - Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

KW - Non-kinase-specific tool

KW - Phosphorylation site prediction

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=84994291874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994291874&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-6406-2_18

DO - 10.1007/978-1-4939-6406-2_18

M3 - Chapter

C2 - 27787832

AN - SCOPUS:84994291874

T3 - Methods in Molecular Biology

SP - 265

EP - 274

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -