Simple alignment-free methods for protein classification: A case study from G-protein-coupled receptors

Pooja K. Strope, Etsuko N. Moriyama

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Computational methods of predicting protein functions rely on detecting similarities among proteins. However, sufficient sequence information is not always available for some protein families. For example, proteins of interest may be new members of a divergent protein family. The performance of protein classification methods could vary in such challenging situations. Using the G-protein-coupled receptor superfamily as an example, we investigated the performance of several protein classifiers. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. Although it is computationally expensive, a support vector machine classifier using local pairwise alignment scores showed very good balanced performance. More commonly used profile hidden Markov models were generally highly specific and well suited to classifying well-established protein family members. It is suggested that different types of protein classifiers should be applied to gain the optimal mining power.

Original languageEnglish (US)
Pages (from-to)602-612
Number of pages11
JournalGenomics
Volume89
Issue number5
DOIs
StatePublished - May 1 2007

Fingerprint

G-Protein-Coupled Receptors
Proteins
Amino Acids

Keywords

  • Amino acid composition
  • G-protein-coupled receptors
  • Profile hidden Markov models
  • Protein classification
  • Support vector machines

ASJC Scopus subject areas

  • Genetics

Cite this

Simple alignment-free methods for protein classification : A case study from G-protein-coupled receptors. / Strope, Pooja K.; Moriyama, Etsuko N.

In: Genomics, Vol. 89, No. 5, 01.05.2007, p. 602-612.

Research output: Contribution to journalArticle

@article{fc9653b0db8f4647bb5e1852c8b5c03f,
title = "Simple alignment-free methods for protein classification: A case study from G-protein-coupled receptors",
abstract = "Computational methods of predicting protein functions rely on detecting similarities among proteins. However, sufficient sequence information is not always available for some protein families. For example, proteins of interest may be new members of a divergent protein family. The performance of protein classification methods could vary in such challenging situations. Using the G-protein-coupled receptor superfamily as an example, we investigated the performance of several protein classifiers. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. Although it is computationally expensive, a support vector machine classifier using local pairwise alignment scores showed very good balanced performance. More commonly used profile hidden Markov models were generally highly specific and well suited to classifying well-established protein family members. It is suggested that different types of protein classifiers should be applied to gain the optimal mining power.",
keywords = "Amino acid composition, G-protein-coupled receptors, Profile hidden Markov models, Protein classification, Support vector machines",
author = "Strope, {Pooja K.} and Moriyama, {Etsuko N.}",
year = "2007",
month = "5",
day = "1",
doi = "10.1016/j.ygeno.2007.01.008",
language = "English (US)",
volume = "89",
pages = "602--612",
journal = "Genomics",
issn = "0888-7543",
publisher = "Academic Press Inc.",
number = "5",

}

TY - JOUR

T1 - Simple alignment-free methods for protein classification

T2 - A case study from G-protein-coupled receptors

AU - Strope, Pooja K.

AU - Moriyama, Etsuko N.

PY - 2007/5/1

Y1 - 2007/5/1

N2 - Computational methods of predicting protein functions rely on detecting similarities among proteins. However, sufficient sequence information is not always available for some protein families. For example, proteins of interest may be new members of a divergent protein family. The performance of protein classification methods could vary in such challenging situations. Using the G-protein-coupled receptor superfamily as an example, we investigated the performance of several protein classifiers. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. Although it is computationally expensive, a support vector machine classifier using local pairwise alignment scores showed very good balanced performance. More commonly used profile hidden Markov models were generally highly specific and well suited to classifying well-established protein family members. It is suggested that different types of protein classifiers should be applied to gain the optimal mining power.

AB - Computational methods of predicting protein functions rely on detecting similarities among proteins. However, sufficient sequence information is not always available for some protein families. For example, proteins of interest may be new members of a divergent protein family. The performance of protein classification methods could vary in such challenging situations. Using the G-protein-coupled receptor superfamily as an example, we investigated the performance of several protein classifiers. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. Although it is computationally expensive, a support vector machine classifier using local pairwise alignment scores showed very good balanced performance. More commonly used profile hidden Markov models were generally highly specific and well suited to classifying well-established protein family members. It is suggested that different types of protein classifiers should be applied to gain the optimal mining power.

KW - Amino acid composition

KW - G-protein-coupled receptors

KW - Profile hidden Markov models

KW - Protein classification

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=34247112742&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247112742&partnerID=8YFLogxK

U2 - 10.1016/j.ygeno.2007.01.008

DO - 10.1016/j.ygeno.2007.01.008

M3 - Article

C2 - 17336495

AN - SCOPUS:34247112742

VL - 89

SP - 602

EP - 612

JO - Genomics

JF - Genomics

SN - 0888-7543

IS - 5

ER -