ngLOC: An n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes

Brian R. King, Chittibabu Guda

Research output: Contribution to journalArticle

58 Citations (Scopus)

Abstract

We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.

Original languageEnglish (US)
Article numberR68
JournalGenome biology
Volume8
Issue number5
DOIs
StatePublished - May 1 2007

Fingerprint

Bayes Theorem
eukaryote
Proteome
Bayesian theory
proteome
Eukaryota
mosquito
Organelles
yeast
organelles
nematode
eukaryotic cells
protein
Zebrafish
fruit flies
Culicidae
Danio rerio
Chickens
amino acid sequences
Yeasts

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Cite this

ngLOC : An n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes. / King, Brian R.; Guda, Chittibabu.

In: Genome biology, Vol. 8, No. 5, R68, 01.05.2007.

Research output: Contribution to journalArticle

@article{99c7dd6f79e74eb6977dcf1dd98775e9,
title = "ngLOC: An n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes",
abstract = "We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89{\%} for sequences localized to a single organelle, and 82{\%} for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.",
author = "King, {Brian R.} and Chittibabu Guda",
year = "2007",
month = "5",
day = "1",
doi = "10.1186/gb-2007-8-5-r68",
language = "English (US)",
volume = "8",
journal = "Genome Biology",
issn = "1465-6906",
publisher = "BioMed Central",
number = "5",

}

TY - JOUR

T1 - ngLOC

T2 - An n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes

AU - King, Brian R.

AU - Guda, Chittibabu

PY - 2007/5/1

Y1 - 2007/5/1

N2 - We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.

AB - We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.

UR - http://www.scopus.com/inward/record.url?scp=34548832558&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548832558&partnerID=8YFLogxK

U2 - 10.1186/gb-2007-8-5-r68

DO - 10.1186/gb-2007-8-5-r68

M3 - Article

C2 - 17472741

AN - SCOPUS:34548832558

VL - 8

JO - Genome Biology

JF - Genome Biology

SN - 1465-6906

IS - 5

M1 - R68

ER -