Computational approaches for automated classification of enzyme sequences

Akram Mohammed, Chittibabu Guda

Research output: Contribution to journalReview article

10 Citations (Scopus)

Abstract

Determining the functional role(s) of enzymes is very important to build the metabolic blueprint of an organism and to identify the potential roles enzymes may play in metabolic and disease pathways. With exponential growth in gene and protein sequence data, it is not feasible to experimentally characterize the function(s) of all enzymes. Alternatively, computational methods can be used to annotate the enormous amount of unannotated enzyme sequences. For function prediction and classification of enzymes, features based on amino acid composition, sequence and structural properties, domain composition and specific peptide information have been widely used by different computational approaches. Each feature space has its own merits and limitations on the overall prediction accuracy. Prediction accuracy improves when machine-learning methods are used to classify enzymes. Given the incomplete and unbalanced nature of annotations in biological databases, ensemble methods or methods that bank on a combination of orthogonal feature are more desirable for achieving higher accuracy and coverage in enzyme classification. In this review article, we systematically describe all the features and methods used thus far for enzyme class prediction. To the authors' knowledge, this review represents the most exhaustive description of methods used for computational prediction of enzyme classes.

Original languageEnglish (US)
Pages (from-to)147-152
Number of pages6
JournalJournal of Proteomics and Bioinformatics
Volume4
Issue number8
DOIs
StatePublished - Sep 9 2011

Fingerprint

Enzymes
Blueprints
Metabolic Diseases
Computational methods
Metabolic Networks and Pathways
Chemical analysis
Peptides
Learning systems
Amino acids
Structural properties
Amino Acid Sequence
Genes
Databases
Proteins
Amino Acids
Growth

Keywords

  • Amino acid composition
  • Domain composition
  • Ensemble method
  • Enzyme classification
  • Machine learning
  • Nearest neighbor predictor
  • Sequence similarity
  • Structural information
  • Support vector machine

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Cell Biology

Cite this

Computational approaches for automated classification of enzyme sequences. / Mohammed, Akram; Guda, Chittibabu.

In: Journal of Proteomics and Bioinformatics, Vol. 4, No. 8, 09.09.2011, p. 147-152.

Research output: Contribution to journalReview article

@article{0a0ddd32d7ed4507947881e6f11042cd,
title = "Computational approaches for automated classification of enzyme sequences",
abstract = "Determining the functional role(s) of enzymes is very important to build the metabolic blueprint of an organism and to identify the potential roles enzymes may play in metabolic and disease pathways. With exponential growth in gene and protein sequence data, it is not feasible to experimentally characterize the function(s) of all enzymes. Alternatively, computational methods can be used to annotate the enormous amount of unannotated enzyme sequences. For function prediction and classification of enzymes, features based on amino acid composition, sequence and structural properties, domain composition and specific peptide information have been widely used by different computational approaches. Each feature space has its own merits and limitations on the overall prediction accuracy. Prediction accuracy improves when machine-learning methods are used to classify enzymes. Given the incomplete and unbalanced nature of annotations in biological databases, ensemble methods or methods that bank on a combination of orthogonal feature are more desirable for achieving higher accuracy and coverage in enzyme classification. In this review article, we systematically describe all the features and methods used thus far for enzyme class prediction. To the authors' knowledge, this review represents the most exhaustive description of methods used for computational prediction of enzyme classes.",
keywords = "Amino acid composition, Domain composition, Ensemble method, Enzyme classification, Machine learning, Nearest neighbor predictor, Sequence similarity, Structural information, Support vector machine",
author = "Akram Mohammed and Chittibabu Guda",
year = "2011",
month = "9",
day = "9",
doi = "10.4172/jpb.1000183",
language = "English (US)",
volume = "4",
pages = "147--152",
journal = "Journal of Proteomics and Bioinformatics",
issn = "0974-276X",
publisher = "Omics Publishing Group",
number = "8",

}

TY - JOUR

T1 - Computational approaches for automated classification of enzyme sequences

AU - Mohammed, Akram

AU - Guda, Chittibabu

PY - 2011/9/9

Y1 - 2011/9/9

N2 - Determining the functional role(s) of enzymes is very important to build the metabolic blueprint of an organism and to identify the potential roles enzymes may play in metabolic and disease pathways. With exponential growth in gene and protein sequence data, it is not feasible to experimentally characterize the function(s) of all enzymes. Alternatively, computational methods can be used to annotate the enormous amount of unannotated enzyme sequences. For function prediction and classification of enzymes, features based on amino acid composition, sequence and structural properties, domain composition and specific peptide information have been widely used by different computational approaches. Each feature space has its own merits and limitations on the overall prediction accuracy. Prediction accuracy improves when machine-learning methods are used to classify enzymes. Given the incomplete and unbalanced nature of annotations in biological databases, ensemble methods or methods that bank on a combination of orthogonal feature are more desirable for achieving higher accuracy and coverage in enzyme classification. In this review article, we systematically describe all the features and methods used thus far for enzyme class prediction. To the authors' knowledge, this review represents the most exhaustive description of methods used for computational prediction of enzyme classes.

AB - Determining the functional role(s) of enzymes is very important to build the metabolic blueprint of an organism and to identify the potential roles enzymes may play in metabolic and disease pathways. With exponential growth in gene and protein sequence data, it is not feasible to experimentally characterize the function(s) of all enzymes. Alternatively, computational methods can be used to annotate the enormous amount of unannotated enzyme sequences. For function prediction and classification of enzymes, features based on amino acid composition, sequence and structural properties, domain composition and specific peptide information have been widely used by different computational approaches. Each feature space has its own merits and limitations on the overall prediction accuracy. Prediction accuracy improves when machine-learning methods are used to classify enzymes. Given the incomplete and unbalanced nature of annotations in biological databases, ensemble methods or methods that bank on a combination of orthogonal feature are more desirable for achieving higher accuracy and coverage in enzyme classification. In this review article, we systematically describe all the features and methods used thus far for enzyme class prediction. To the authors' knowledge, this review represents the most exhaustive description of methods used for computational prediction of enzyme classes.

KW - Amino acid composition

KW - Domain composition

KW - Ensemble method

KW - Enzyme classification

KW - Machine learning

KW - Nearest neighbor predictor

KW - Sequence similarity

KW - Structural information

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=80052409527&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052409527&partnerID=8YFLogxK

U2 - 10.4172/jpb.1000183

DO - 10.4172/jpb.1000183

M3 - Review article

C2 - 22114367

AN - SCOPUS:80052409527

VL - 4

SP - 147

EP - 152

JO - Journal of Proteomics and Bioinformatics

JF - Journal of Proteomics and Bioinformatics

SN - 0974-276X

IS - 8

ER -