NgLOC: Software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes

Brian R. King, Suleyman Vural, Sanjit Pandey, Alex Barteau, Chittibabu Guda

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

Background: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function. Findings. We present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively. Conclusions: ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at.

Original languageEnglish (US)
Article number351
JournalBMC Research Notes
Volume5
DOIs
StatePublished - Jul 12 2012

Fingerprint

Eukaryota
Servers
Software
Proteins
Computational methods
Software packages
Organelles
Animals
Classifiers
Research Personnel

Keywords

  • Bayesian method
  • Machine learning algorithm
  • N-gram-based approach
  • Protein sequence classification
  • Protein subcellular localization prediction
  • ngLOC

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

NgLOC : Software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes. / King, Brian R.; Vural, Suleyman; Pandey, Sanjit; Barteau, Alex; Guda, Chittibabu.

In: BMC Research Notes, Vol. 5, 351, 12.07.2012.

Research output: Contribution to journalArticle

@article{208086c1d94542b08598a567cff154b9,
title = "NgLOC: Software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes",
abstract = "Background: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function. Findings. We present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8{\%} to 91.4{\%} across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively. Conclusions: ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at.",
keywords = "Bayesian method, Machine learning algorithm, N-gram-based approach, Protein sequence classification, Protein subcellular localization prediction, ngLOC",
author = "King, {Brian R.} and Suleyman Vural and Sanjit Pandey and Alex Barteau and Chittibabu Guda",
year = "2012",
month = "7",
day = "12",
doi = "10.1186/1756-0500-5-351",
language = "English (US)",
volume = "5",
journal = "BMC Research Notes",
issn = "1756-0500",
publisher = "BioMed Central",

}

TY - JOUR

T1 - NgLOC

T2 - Software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes

AU - King, Brian R.

AU - Vural, Suleyman

AU - Pandey, Sanjit

AU - Barteau, Alex

AU - Guda, Chittibabu

PY - 2012/7/12

Y1 - 2012/7/12

N2 - Background: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function. Findings. We present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively. Conclusions: ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at.

AB - Background: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function. Findings. We present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively. Conclusions: ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at.

KW - Bayesian method

KW - Machine learning algorithm

KW - N-gram-based approach

KW - Protein sequence classification

KW - Protein subcellular localization prediction

KW - ngLOC

UR - http://www.scopus.com/inward/record.url?scp=84863601391&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863601391&partnerID=8YFLogxK

U2 - 10.1186/1756-0500-5-351

DO - 10.1186/1756-0500-5-351

M3 - Article

C2 - 22780965

AN - SCOPUS:84863601391

VL - 5

JO - BMC Research Notes

JF - BMC Research Notes

SN - 1756-0500

M1 - 351

ER -