Ontology-based literature mining of E. coli vaccine-associated gene interaction networks

Junguk Hur, Arzucan Özgür, Yongqun He

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Background: Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. Methods: In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Results: Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of these gene interaction networks identified top ranked E. coli genes and 6 INO interaction types (e.g., regulation and gene expression). Conclusions: Vaccine-related E. coli gene-gene interaction network was constructed using ontology-based literature mining strategy, which identified important E. coli vaccine genes and their interactions with other genes through specific interaction types.

Original languageEnglish (US)
Article number12
JournalJournal of Biomedical Semantics
Volume8
Issue number1
DOIs
StatePublished - Mar 14 2017

Fingerprint

Escherichia coli Vaccines
Vaccines
Gene Regulatory Networks
Escherichia coli
Ontology
Genes
Escherichia coli Infections
Tocopherols
PubMed
Names

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Health Informatics
  • Computer Networks and Communications

Cite this

Ontology-based literature mining of E. coli vaccine-associated gene interaction networks. / Hur, Junguk; Özgür, Arzucan; He, Yongqun.

In: Journal of Biomedical Semantics, Vol. 8, No. 1, 12, 14.03.2017.

Research output: Contribution to journalArticle

@article{b8fdccb78e2a43b1b90dfffcf13a58cc,
title = "Ontology-based literature mining of E. coli vaccine-associated gene interaction networks",
abstract = "Background: Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. Methods: In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Results: Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of these gene interaction networks identified top ranked E. coli genes and 6 INO interaction types (e.g., regulation and gene expression). Conclusions: Vaccine-related E. coli gene-gene interaction network was constructed using ontology-based literature mining strategy, which identified important E. coli vaccine genes and their interactions with other genes through specific interaction types.",
author = "Junguk Hur and Arzucan {\"O}zg{\"u}r and Yongqun He",
year = "2017",
month = "3",
day = "14",
doi = "10.1186/s13326-017-0122-4",
language = "English (US)",
volume = "8",
journal = "Journal of Biomedical Semantics",
issn = "2041-1480",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Ontology-based literature mining of E. coli vaccine-associated gene interaction networks

AU - Hur, Junguk

AU - Özgür, Arzucan

AU - He, Yongqun

PY - 2017/3/14

Y1 - 2017/3/14

N2 - Background: Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. Methods: In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Results: Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of these gene interaction networks identified top ranked E. coli genes and 6 INO interaction types (e.g., regulation and gene expression). Conclusions: Vaccine-related E. coli gene-gene interaction network was constructed using ontology-based literature mining strategy, which identified important E. coli vaccine genes and their interactions with other genes through specific interaction types.

AB - Background: Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. Methods: In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Results: Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of these gene interaction networks identified top ranked E. coli genes and 6 INO interaction types (e.g., regulation and gene expression). Conclusions: Vaccine-related E. coli gene-gene interaction network was constructed using ontology-based literature mining strategy, which identified important E. coli vaccine genes and their interactions with other genes through specific interaction types.

UR - http://www.scopus.com/inward/record.url?scp=85014957878&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014957878&partnerID=8YFLogxK

U2 - 10.1186/s13326-017-0122-4

DO - 10.1186/s13326-017-0122-4

M3 - Article

C2 - 28288685

AN - SCOPUS:85014957878

VL - 8

JO - Journal of Biomedical Semantics

JF - Journal of Biomedical Semantics

SN - 2041-1480

IS - 1

M1 - 12

ER -