Exploring database keyword search for association studies between genetic variants and diseases

Dhawal Verma, Hesham H Ali, Zhengxin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Keyword search plays a critical role for researchers in bioinformatics to retrieve structured, semi-structured, and unstructured data. In addition, in order to fully exploit the rich repository of biological databases, data mining has drawn increasing attention of researchers. An interesting issue is to examine the possible relationship between database keyword search (DB KWS) and indepth database exploration (or data mining) in the context of bioinformatics, and in particular, the potential contribution of DB KWS for data mining. However, so far there is no known systematic investigation on this relationship. In this paper, we provide a preliminary discussion on how we can take advantage of DB KWS for in-depth exploration of biological databases, and describe a case study on the association between genetic variants and diseases. The case study is motivated from the fact that the advent of high throughput sequencing technologies have facilitated in generating a huge amount of genomic data. A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways and a plethora of other information such as the disease-gene associations, the variants associated with the diseases etc. A database was curated of the genome wide association studies, and an algorithm inspired by DBXplorer was used to implement the keyword search over the database in JAVA. The case study further proposes ways to include the association rule mining as a data mining technique, which is useful for discovering interesting relationships hidden in large data sets, to further investigate the results of the keyword search when done with different yet sensible combinations of disease and genes. We believe that such an integrated study to explore the potential of how bioinformatics can take advantage of both techniques in a single bioinformatics application would be a very interesting issue of both theoretical and practical importance.

Original languageEnglish (US)
Title of host publicationProcedia Computer Science
Pages206-213
Number of pages8
Volume17
DOIs
StatePublished - 2013
Event1st International Conference on Information Technology and Quantitative Management, ITQM 2013 - Suzhou, China
Duration: May 16 2013May 18 2013

Other

Other1st International Conference on Information Technology and Quantitative Management, ITQM 2013
CountryChina
CitySuzhou
Period5/16/135/18/13

Fingerprint

Bioinformatics
Data mining
Genes
Bioelectric potentials
Association rules
Throughput

Keywords

  • Data Mining
  • Genome Wide Association Studies
  • Keyword Search

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Exploring database keyword search for association studies between genetic variants and diseases. / Verma, Dhawal; Ali, Hesham H; Chen, Zhengxin.

Procedia Computer Science. Vol. 17 2013. p. 206-213.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Verma, D, Ali, HH & Chen, Z 2013, Exploring database keyword search for association studies between genetic variants and diseases. in Procedia Computer Science. vol. 17, pp. 206-213, 1st International Conference on Information Technology and Quantitative Management, ITQM 2013, Suzhou, China, 5/16/13. https://doi.org/10.1016/j.procs.2013.05.028
Verma, Dhawal ; Ali, Hesham H ; Chen, Zhengxin. / Exploring database keyword search for association studies between genetic variants and diseases. Procedia Computer Science. Vol. 17 2013. pp. 206-213
@inproceedings{39763f79d36b4ff5ae38a24b1c65decf,
title = "Exploring database keyword search for association studies between genetic variants and diseases",
abstract = "Keyword search plays a critical role for researchers in bioinformatics to retrieve structured, semi-structured, and unstructured data. In addition, in order to fully exploit the rich repository of biological databases, data mining has drawn increasing attention of researchers. An interesting issue is to examine the possible relationship between database keyword search (DB KWS) and indepth database exploration (or data mining) in the context of bioinformatics, and in particular, the potential contribution of DB KWS for data mining. However, so far there is no known systematic investigation on this relationship. In this paper, we provide a preliminary discussion on how we can take advantage of DB KWS for in-depth exploration of biological databases, and describe a case study on the association between genetic variants and diseases. The case study is motivated from the fact that the advent of high throughput sequencing technologies have facilitated in generating a huge amount of genomic data. A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways and a plethora of other information such as the disease-gene associations, the variants associated with the diseases etc. A database was curated of the genome wide association studies, and an algorithm inspired by DBXplorer was used to implement the keyword search over the database in JAVA. The case study further proposes ways to include the association rule mining as a data mining technique, which is useful for discovering interesting relationships hidden in large data sets, to further investigate the results of the keyword search when done with different yet sensible combinations of disease and genes. We believe that such an integrated study to explore the potential of how bioinformatics can take advantage of both techniques in a single bioinformatics application would be a very interesting issue of both theoretical and practical importance.",
keywords = "Data Mining, Genome Wide Association Studies, Keyword Search",
author = "Dhawal Verma and Ali, {Hesham H} and Zhengxin Chen",
year = "2013",
doi = "10.1016/j.procs.2013.05.028",
language = "English (US)",
volume = "17",
pages = "206--213",
booktitle = "Procedia Computer Science",

}

TY - GEN

T1 - Exploring database keyword search for association studies between genetic variants and diseases

AU - Verma, Dhawal

AU - Ali, Hesham H

AU - Chen, Zhengxin

PY - 2013

Y1 - 2013

N2 - Keyword search plays a critical role for researchers in bioinformatics to retrieve structured, semi-structured, and unstructured data. In addition, in order to fully exploit the rich repository of biological databases, data mining has drawn increasing attention of researchers. An interesting issue is to examine the possible relationship between database keyword search (DB KWS) and indepth database exploration (or data mining) in the context of bioinformatics, and in particular, the potential contribution of DB KWS for data mining. However, so far there is no known systematic investigation on this relationship. In this paper, we provide a preliminary discussion on how we can take advantage of DB KWS for in-depth exploration of biological databases, and describe a case study on the association between genetic variants and diseases. The case study is motivated from the fact that the advent of high throughput sequencing technologies have facilitated in generating a huge amount of genomic data. A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways and a plethora of other information such as the disease-gene associations, the variants associated with the diseases etc. A database was curated of the genome wide association studies, and an algorithm inspired by DBXplorer was used to implement the keyword search over the database in JAVA. The case study further proposes ways to include the association rule mining as a data mining technique, which is useful for discovering interesting relationships hidden in large data sets, to further investigate the results of the keyword search when done with different yet sensible combinations of disease and genes. We believe that such an integrated study to explore the potential of how bioinformatics can take advantage of both techniques in a single bioinformatics application would be a very interesting issue of both theoretical and practical importance.

AB - Keyword search plays a critical role for researchers in bioinformatics to retrieve structured, semi-structured, and unstructured data. In addition, in order to fully exploit the rich repository of biological databases, data mining has drawn increasing attention of researchers. An interesting issue is to examine the possible relationship between database keyword search (DB KWS) and indepth database exploration (or data mining) in the context of bioinformatics, and in particular, the potential contribution of DB KWS for data mining. However, so far there is no known systematic investigation on this relationship. In this paper, we provide a preliminary discussion on how we can take advantage of DB KWS for in-depth exploration of biological databases, and describe a case study on the association between genetic variants and diseases. The case study is motivated from the fact that the advent of high throughput sequencing technologies have facilitated in generating a huge amount of genomic data. A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways and a plethora of other information such as the disease-gene associations, the variants associated with the diseases etc. A database was curated of the genome wide association studies, and an algorithm inspired by DBXplorer was used to implement the keyword search over the database in JAVA. The case study further proposes ways to include the association rule mining as a data mining technique, which is useful for discovering interesting relationships hidden in large data sets, to further investigate the results of the keyword search when done with different yet sensible combinations of disease and genes. We believe that such an integrated study to explore the potential of how bioinformatics can take advantage of both techniques in a single bioinformatics application would be a very interesting issue of both theoretical and practical importance.

KW - Data Mining

KW - Genome Wide Association Studies

KW - Keyword Search

UR - http://www.scopus.com/inward/record.url?scp=84898724017&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898724017&partnerID=8YFLogxK

U2 - 10.1016/j.procs.2013.05.028

DO - 10.1016/j.procs.2013.05.028

M3 - Conference contribution

AN - SCOPUS:84898724017

VL - 17

SP - 206

EP - 213

BT - Procedia Computer Science

ER -