A novel function prediction approach using protein overlap networks

Shide Liang, Dandan Zheng, Daron M. Standley, Huarong Guo, Chi Zhang

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background: Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database.Results: The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein.Conclusions: The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.

Original languageEnglish (US)
Article number61
JournalBMC systems biology
Volume7
DOIs
StatePublished - Jul 17 2013

Fingerprint

Overlap
Proteins
Protein
Genes
Prediction
Gene Ontology
Ontology
Genome
Coverage
Term
Composite
Worm
Composite materials
Vertex of a graph
Diptera
Yeast
Network Model
Statistical Analysis
Annotation
Counting

Keywords

  • Composite network
  • Functional genomics
  • Protein function prediction
  • Protein overlap network

ASJC Scopus subject areas

  • Structural Biology
  • Modeling and Simulation
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

A novel function prediction approach using protein overlap networks. / Liang, Shide; Zheng, Dandan; Standley, Daron M.; Guo, Huarong; Zhang, Chi.

In: BMC systems biology, Vol. 7, 61, 17.07.2013.

Research output: Contribution to journalArticle

@article{3a5e2374e91b40489232f83e2a628bb1,
title = "A novel function prediction approach using protein overlap networks",
abstract = "Background: Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database.Results: The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3{\%} and 43.9{\%}, respectively, for the test genomes, and were increased to 37.9{\%} and 51.3{\%} when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0{\%} in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30{\%}) and coverage (>30{\%}), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein.Conclusions: The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.",
keywords = "Composite network, Functional genomics, Protein function prediction, Protein overlap network",
author = "Shide Liang and Dandan Zheng and Standley, {Daron M.} and Huarong Guo and Chi Zhang",
year = "2013",
month = "7",
day = "17",
doi = "10.1186/1752-0509-7-61",
language = "English (US)",
volume = "7",
journal = "BMC Systems Biology",
issn = "1752-0509",
publisher = "BioMed Central",

}

TY - JOUR

T1 - A novel function prediction approach using protein overlap networks

AU - Liang, Shide

AU - Zheng, Dandan

AU - Standley, Daron M.

AU - Guo, Huarong

AU - Zhang, Chi

PY - 2013/7/17

Y1 - 2013/7/17

N2 - Background: Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database.Results: The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein.Conclusions: The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.

AB - Background: Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database.Results: The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein.Conclusions: The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.

KW - Composite network

KW - Functional genomics

KW - Protein function prediction

KW - Protein overlap network

UR - http://www.scopus.com/inward/record.url?scp=84880182330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880182330&partnerID=8YFLogxK

U2 - 10.1186/1752-0509-7-61

DO - 10.1186/1752-0509-7-61

M3 - Article

C2 - 23866986

AN - SCOPUS:84880182330

VL - 7

JO - BMC Systems Biology

JF - BMC Systems Biology

SN - 1752-0509

M1 - 61

ER -