SciMiner: Web-based literature mining tool for target identification and functional enrichment analysis

Junguk Hur, Adam D. Schuyler, David J. States, Eva L. Feldman

Research output: Contribution to journalArticle

58 Citations (Scopus)

Abstract

SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources. Ambiguous acronyms are resolved by a scoring scheme based on the co-occurrence of acronyms and corresponding description terms, which incorporates optional user-defined filters. Functional enrichment analyses are used to identify highly relevant targets (genes and proteins), GO (Gene Ontology) terms, MeSH (Medical Subject Headings) terms, pathways and protein-protein interaction networks by comparing identified targets from one search result with those from other searches or to the full HGNC [HUGO (Human Genome Organization) Gene Nomenclature Committee] gene set. The performance of gene-protein name identification was evaluated using the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) version 2 (Year 2006) Gene Normalization Task as a gold standard. SciMiner achieved 87.1% recall, 71.3% precision and 75.8% F-measure. SciMiner's literature mining performance coupled with functional enrichment analyses provides an efficient platform for retrieval and summary of rich biological information from corpora of users' interests.

Original languageEnglish (US)
Pages (from-to)838-840
Number of pages3
JournalBioinformatics
Volume25
Issue number6
DOIs
StatePublished - Mar 1 2009

Fingerprint

Target Identification
Web-based
Mining
Genes
Gene
PubMed
Names
Proteins
Acronym
Medical Subject Headings
Protein Interaction Maps
Gene Ontology
Systems Biology
Protein
Information Storage and Retrieval
Functional analysis
Terminology
Information Systems
MEDLINE
Term

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

SciMiner : Web-based literature mining tool for target identification and functional enrichment analysis. / Hur, Junguk; Schuyler, Adam D.; States, David J.; Feldman, Eva L.

In: Bioinformatics, Vol. 25, No. 6, 01.03.2009, p. 838-840.

Research output: Contribution to journalArticle

Hur, Junguk ; Schuyler, Adam D. ; States, David J. ; Feldman, Eva L. / SciMiner : Web-based literature mining tool for target identification and functional enrichment analysis. In: Bioinformatics. 2009 ; Vol. 25, No. 6. pp. 838-840.
@article{c7b53b5e2d4f42028956a24c4a6ea9ce,
title = "SciMiner: Web-based literature mining tool for target identification and functional enrichment analysis",
abstract = "SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources. Ambiguous acronyms are resolved by a scoring scheme based on the co-occurrence of acronyms and corresponding description terms, which incorporates optional user-defined filters. Functional enrichment analyses are used to identify highly relevant targets (genes and proteins), GO (Gene Ontology) terms, MeSH (Medical Subject Headings) terms, pathways and protein-protein interaction networks by comparing identified targets from one search result with those from other searches or to the full HGNC [HUGO (Human Genome Organization) Gene Nomenclature Committee] gene set. The performance of gene-protein name identification was evaluated using the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) version 2 (Year 2006) Gene Normalization Task as a gold standard. SciMiner achieved 87.1{\%} recall, 71.3{\%} precision and 75.8{\%} F-measure. SciMiner's literature mining performance coupled with functional enrichment analyses provides an efficient platform for retrieval and summary of rich biological information from corpora of users' interests.",
author = "Junguk Hur and Schuyler, {Adam D.} and States, {David J.} and Feldman, {Eva L.}",
year = "2009",
month = "3",
day = "1",
doi = "10.1093/bioinformatics/btp049",
language = "English (US)",
volume = "25",
pages = "838--840",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - SciMiner

T2 - Web-based literature mining tool for target identification and functional enrichment analysis

AU - Hur, Junguk

AU - Schuyler, Adam D.

AU - States, David J.

AU - Feldman, Eva L.

PY - 2009/3/1

Y1 - 2009/3/1

N2 - SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources. Ambiguous acronyms are resolved by a scoring scheme based on the co-occurrence of acronyms and corresponding description terms, which incorporates optional user-defined filters. Functional enrichment analyses are used to identify highly relevant targets (genes and proteins), GO (Gene Ontology) terms, MeSH (Medical Subject Headings) terms, pathways and protein-protein interaction networks by comparing identified targets from one search result with those from other searches or to the full HGNC [HUGO (Human Genome Organization) Gene Nomenclature Committee] gene set. The performance of gene-protein name identification was evaluated using the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) version 2 (Year 2006) Gene Normalization Task as a gold standard. SciMiner achieved 87.1% recall, 71.3% precision and 75.8% F-measure. SciMiner's literature mining performance coupled with functional enrichment analyses provides an efficient platform for retrieval and summary of rich biological information from corpora of users' interests.

AB - SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources. Ambiguous acronyms are resolved by a scoring scheme based on the co-occurrence of acronyms and corresponding description terms, which incorporates optional user-defined filters. Functional enrichment analyses are used to identify highly relevant targets (genes and proteins), GO (Gene Ontology) terms, MeSH (Medical Subject Headings) terms, pathways and protein-protein interaction networks by comparing identified targets from one search result with those from other searches or to the full HGNC [HUGO (Human Genome Organization) Gene Nomenclature Committee] gene set. The performance of gene-protein name identification was evaluated using the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) version 2 (Year 2006) Gene Normalization Task as a gold standard. SciMiner achieved 87.1% recall, 71.3% precision and 75.8% F-measure. SciMiner's literature mining performance coupled with functional enrichment analyses provides an efficient platform for retrieval and summary of rich biological information from corpora of users' interests.

UR - http://www.scopus.com/inward/record.url?scp=62549104880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62549104880&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btp049

DO - 10.1093/bioinformatics/btp049

M3 - Article

C2 - 19188191

AN - SCOPUS:62549104880

VL - 25

SP - 838

EP - 840

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 6

ER -