Prediction of functional class of novel bacterial proteins without the use of sequence similarity by a statistical learning method

J. Cui, L. Y. Han, C. Z. Cai, C. J. Zheng, Z. L. Ji, Y. Z. Chen

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

A substantial percentage of the putative protein-encoding open reading frames (ORFs) in bacterial genomes have no homolog of known function, and their function cannot be confidently assigned on the basis of sequence similarity. Methods not based on sequence similarity are needed and being developed. One method, SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi), predicts protein functional family irrespective of sequence similarity (Nucleic Acids Res. 2003;31:3692-3697). While it has been tested on a large number of proteins, its capability for non-homologous proteins has so far been evaluated for a relatively small number of proteins, and additional tests are needed to more fully assess SVMProt. In this work, 90 novel bacterial proteins (non-homologous to known proteins) are used to evaluate the capability of SVMProt. These proteins are such that none of their homologs are in the Swiss-Prot database, their functions not clearly described in the literature, and they themselves and their homologs are not included in the training sets of SVMProt. They represent proteins whose function cannot be confidently predicted by sequence similarity methods at present. The predicted functional class of 76.7% of each of these proteins shows various levels of consistency with the literature-described function, compared to the overall accuracy of 87% for the SVMProt functional class assignment of 34,582 proteins that have at least one homolog of known function. Our study suggests that SVMProt is capable of assigning functional class for novel bacterial proteins at a level not too much lower than that of sequence alignment methods for homologous proteins.

Original languageEnglish (US)
Pages (from-to)86-100
Number of pages15
JournalJournal of Molecular Microbiology and Biotechnology
Volume9
Issue number2
DOIs
StatePublished - Nov 1 2005

Fingerprint

Bacterial Proteins
Learning
Proteins
Bacterial Genomes
Sequence Alignment
Nucleic Acids
Open Reading Frames
Databases

Keywords

  • Open reading frames, protein
  • Proteins, non-homologous
  • SVMProt

ASJC Scopus subject areas

  • Biotechnology
  • Microbiology
  • Applied Microbiology and Biotechnology
  • Molecular Biology

Cite this

Prediction of functional class of novel bacterial proteins without the use of sequence similarity by a statistical learning method. / Cui, J.; Han, L. Y.; Cai, C. Z.; Zheng, C. J.; Ji, Z. L.; Chen, Y. Z.

In: Journal of Molecular Microbiology and Biotechnology, Vol. 9, No. 2, 01.11.2005, p. 86-100.

Research output: Contribution to journalArticle

@article{832dae027cd84299b5e70c0a104c7c82,
title = "Prediction of functional class of novel bacterial proteins without the use of sequence similarity by a statistical learning method",
abstract = "A substantial percentage of the putative protein-encoding open reading frames (ORFs) in bacterial genomes have no homolog of known function, and their function cannot be confidently assigned on the basis of sequence similarity. Methods not based on sequence similarity are needed and being developed. One method, SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi), predicts protein functional family irrespective of sequence similarity (Nucleic Acids Res. 2003;31:3692-3697). While it has been tested on a large number of proteins, its capability for non-homologous proteins has so far been evaluated for a relatively small number of proteins, and additional tests are needed to more fully assess SVMProt. In this work, 90 novel bacterial proteins (non-homologous to known proteins) are used to evaluate the capability of SVMProt. These proteins are such that none of their homologs are in the Swiss-Prot database, their functions not clearly described in the literature, and they themselves and their homologs are not included in the training sets of SVMProt. They represent proteins whose function cannot be confidently predicted by sequence similarity methods at present. The predicted functional class of 76.7{\%} of each of these proteins shows various levels of consistency with the literature-described function, compared to the overall accuracy of 87{\%} for the SVMProt functional class assignment of 34,582 proteins that have at least one homolog of known function. Our study suggests that SVMProt is capable of assigning functional class for novel bacterial proteins at a level not too much lower than that of sequence alignment methods for homologous proteins.",
keywords = "Open reading frames, protein, Proteins, non-homologous, SVMProt",
author = "J. Cui and Han, {L. Y.} and Cai, {C. Z.} and Zheng, {C. J.} and Ji, {Z. L.} and Chen, {Y. Z.}",
year = "2005",
month = "11",
day = "1",
doi = "10.1159/000088839",
language = "English (US)",
volume = "9",
pages = "86--100",
journal = "Journal of Molecular Microbiology and Biotechnology",
issn = "1464-1801",
publisher = "S. Karger AG",
number = "2",

}

TY - JOUR

T1 - Prediction of functional class of novel bacterial proteins without the use of sequence similarity by a statistical learning method

AU - Cui, J.

AU - Han, L. Y.

AU - Cai, C. Z.

AU - Zheng, C. J.

AU - Ji, Z. L.

AU - Chen, Y. Z.

PY - 2005/11/1

Y1 - 2005/11/1

N2 - A substantial percentage of the putative protein-encoding open reading frames (ORFs) in bacterial genomes have no homolog of known function, and their function cannot be confidently assigned on the basis of sequence similarity. Methods not based on sequence similarity are needed and being developed. One method, SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi), predicts protein functional family irrespective of sequence similarity (Nucleic Acids Res. 2003;31:3692-3697). While it has been tested on a large number of proteins, its capability for non-homologous proteins has so far been evaluated for a relatively small number of proteins, and additional tests are needed to more fully assess SVMProt. In this work, 90 novel bacterial proteins (non-homologous to known proteins) are used to evaluate the capability of SVMProt. These proteins are such that none of their homologs are in the Swiss-Prot database, their functions not clearly described in the literature, and they themselves and their homologs are not included in the training sets of SVMProt. They represent proteins whose function cannot be confidently predicted by sequence similarity methods at present. The predicted functional class of 76.7% of each of these proteins shows various levels of consistency with the literature-described function, compared to the overall accuracy of 87% for the SVMProt functional class assignment of 34,582 proteins that have at least one homolog of known function. Our study suggests that SVMProt is capable of assigning functional class for novel bacterial proteins at a level not too much lower than that of sequence alignment methods for homologous proteins.

AB - A substantial percentage of the putative protein-encoding open reading frames (ORFs) in bacterial genomes have no homolog of known function, and their function cannot be confidently assigned on the basis of sequence similarity. Methods not based on sequence similarity are needed and being developed. One method, SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi), predicts protein functional family irrespective of sequence similarity (Nucleic Acids Res. 2003;31:3692-3697). While it has been tested on a large number of proteins, its capability for non-homologous proteins has so far been evaluated for a relatively small number of proteins, and additional tests are needed to more fully assess SVMProt. In this work, 90 novel bacterial proteins (non-homologous to known proteins) are used to evaluate the capability of SVMProt. These proteins are such that none of their homologs are in the Swiss-Prot database, their functions not clearly described in the literature, and they themselves and their homologs are not included in the training sets of SVMProt. They represent proteins whose function cannot be confidently predicted by sequence similarity methods at present. The predicted functional class of 76.7% of each of these proteins shows various levels of consistency with the literature-described function, compared to the overall accuracy of 87% for the SVMProt functional class assignment of 34,582 proteins that have at least one homolog of known function. Our study suggests that SVMProt is capable of assigning functional class for novel bacterial proteins at a level not too much lower than that of sequence alignment methods for homologous proteins.

KW - Open reading frames, protein

KW - Proteins, non-homologous

KW - SVMProt

UR - http://www.scopus.com/inward/record.url?scp=27944474288&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27944474288&partnerID=8YFLogxK

U2 - 10.1159/000088839

DO - 10.1159/000088839

M3 - Article

C2 - 16319498

AN - SCOPUS:27944474288

VL - 9

SP - 86

EP - 100

JO - Journal of Molecular Microbiology and Biotechnology

JF - Journal of Molecular Microbiology and Biotechnology

SN - 1464-1801

IS - 2

ER -