Prediction of functional class of novel plant proteins by a statistical learning method

L. Y. Han, C. J. Zheng, H. H. Lin, J. Cui, H. Li, H. L. Zhang, Z. Q. Tang, Y. Z. Chen

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

• In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods. • A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated. • To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999-2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods. • The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4% of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.

Original languageEnglish (US)
Pages (from-to)109-121
Number of pages13
JournalNew Phytologist
Volume168
Issue number1
DOIs
StatePublished - Oct 1 2005

Fingerprint

Plant Proteins
plant proteins
learning
Learning
prediction
Proteins
proteins
methodology
Sequence Alignment
sequence alignment
Open Reading Frames
Cluster Analysis
open reading frames
Plant Genome
Protein Databases
Aptitude
Sequence Homology
MEDLINE
Software
Databases

Keywords

  • Novel plant protein
  • Open reading frames
  • Protein function prediction
  • Protein sequence
  • Support vector machines

ASJC Scopus subject areas

  • Physiology
  • Plant Science

Cite this

Prediction of functional class of novel plant proteins by a statistical learning method. / Han, L. Y.; Zheng, C. J.; Lin, H. H.; Cui, J.; Li, H.; Zhang, H. L.; Tang, Z. Q.; Chen, Y. Z.

In: New Phytologist, Vol. 168, No. 1, 01.10.2005, p. 109-121.

Research output: Contribution to journalArticle

Han, LY, Zheng, CJ, Lin, HH, Cui, J, Li, H, Zhang, HL, Tang, ZQ & Chen, YZ 2005, 'Prediction of functional class of novel plant proteins by a statistical learning method', New Phytologist, vol. 168, no. 1, pp. 109-121. https://doi.org/10.1111/j.1469-8137.2005.01482.x
Han, L. Y. ; Zheng, C. J. ; Lin, H. H. ; Cui, J. ; Li, H. ; Zhang, H. L. ; Tang, Z. Q. ; Chen, Y. Z. / Prediction of functional class of novel plant proteins by a statistical learning method. In: New Phytologist. 2005 ; Vol. 168, No. 1. pp. 109-121.
@article{8cc2e90685fa4ba089b4d80a580cfddc,
title = "Prediction of functional class of novel plant proteins by a statistical learning method",
abstract = "• In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods. • A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated. • To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999-2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods. • The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4{\%} of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.",
keywords = "Novel plant protein, Open reading frames, Protein function prediction, Protein sequence, Support vector machines",
author = "Han, {L. Y.} and Zheng, {C. J.} and Lin, {H. H.} and J. Cui and H. Li and Zhang, {H. L.} and Tang, {Z. Q.} and Chen, {Y. Z.}",
year = "2005",
month = "10",
day = "1",
doi = "10.1111/j.1469-8137.2005.01482.x",
language = "English (US)",
volume = "168",
pages = "109--121",
journal = "New Phytologist",
issn = "0028-646X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Prediction of functional class of novel plant proteins by a statistical learning method

AU - Han, L. Y.

AU - Zheng, C. J.

AU - Lin, H. H.

AU - Cui, J.

AU - Li, H.

AU - Zhang, H. L.

AU - Tang, Z. Q.

AU - Chen, Y. Z.

PY - 2005/10/1

Y1 - 2005/10/1

N2 - • In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods. • A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated. • To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999-2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods. • The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4% of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.

AB - • In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods. • A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated. • To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999-2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods. • The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4% of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.

KW - Novel plant protein

KW - Open reading frames

KW - Protein function prediction

KW - Protein sequence

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=32944472927&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32944472927&partnerID=8YFLogxK

U2 - 10.1111/j.1469-8137.2005.01482.x

DO - 10.1111/j.1469-8137.2005.01482.x

M3 - Article

C2 - 16159326

AN - SCOPUS:32944472927

VL - 168

SP - 109

EP - 121

JO - New Phytologist

JF - New Phytologist

SN - 0028-646X

IS - 1

ER -