Identification of Mycobacterium species using curated custom databases

Dan Kuyper, Hesham H Ali, Amr M. Mohamed, Steven Heye Hinrichs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Advances in molecular biology have resulted in the development of diagnostic tests for infectious diseases based on genetic profiles. While probe based assays dominate the field today, sequence based assays hold great promise for the future. However, the variability in quality of sequence information currently present in public databases limits the potential growth and use of sequence based analysis. To address this problem a standardized method for DNA sequence validation and building of custom databases was developed using Mycobacterium as a development model. With this model, a computational approach to identification of infectious diseases was developed and evaluated. The web-based application, termed BioDatabase, accomplished genetic sequence identification via the creation of curated databases containing a relatively small set of genetic data specific to a species or group. The process for creation of the custom database included multiple steps beginning with identification of highly conserved start and end sequences and intervening sequence validation parameters. The process eliminated the need for multiple sequence alignment with GenBank sequences, whose information is valuable, yet difficult to properly utilize due to its size and quality. The custom database approach maximized application performance with minimal impact on analysis response time, allowing investigation of optimal sequences for identification of all Mycobacterium to the species level. In comparison to the 16S and ITS genetic regions, a curated ITS based approach proved most effective for identification of Mycobacterium isolates.

Original languageEnglish (US)
Title of host publicationProceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
Pages2679-2685
Number of pages7
StatePublished - Dec 1 2004
EventProceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM) - Santa Fe, NM, United States
Duration: Apr 26 2004Apr 30 2004

Publication series

NameProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
Volume18

Conference

ConferenceProceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
CountryUnited States
CitySanta Fe, NM
Period4/26/044/30/04

Fingerprint

Assays
Molecular biology
DNA sequences
Identification (control systems)

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Kuyper, D., Ali, H. H., Mohamed, A. M., & Hinrichs, S. H. (2004). Identification of Mycobacterium species using curated custom databases. In Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM) (pp. 2679-2685). (Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM); Vol. 18).

Identification of Mycobacterium species using curated custom databases. / Kuyper, Dan; Ali, Hesham H; Mohamed, Amr M.; Hinrichs, Steven Heye.

Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM). 2004. p. 2679-2685 (Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM); Vol. 18).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kuyper, D, Ali, HH, Mohamed, AM & Hinrichs, SH 2004, Identification of Mycobacterium species using curated custom databases. in Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM). Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM), vol. 18, pp. 2679-2685, Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM), Santa Fe, NM, United States, 4/26/04.
Kuyper D, Ali HH, Mohamed AM, Hinrichs SH. Identification of Mycobacterium species using curated custom databases. In Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM). 2004. p. 2679-2685. (Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)).
Kuyper, Dan ; Ali, Hesham H ; Mohamed, Amr M. ; Hinrichs, Steven Heye. / Identification of Mycobacterium species using curated custom databases. Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM). 2004. pp. 2679-2685 (Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)).
@inproceedings{3e5cfcd7c1634f73a2d12ac83674c624,
title = "Identification of Mycobacterium species using curated custom databases",
abstract = "Advances in molecular biology have resulted in the development of diagnostic tests for infectious diseases based on genetic profiles. While probe based assays dominate the field today, sequence based assays hold great promise for the future. However, the variability in quality of sequence information currently present in public databases limits the potential growth and use of sequence based analysis. To address this problem a standardized method for DNA sequence validation and building of custom databases was developed using Mycobacterium as a development model. With this model, a computational approach to identification of infectious diseases was developed and evaluated. The web-based application, termed BioDatabase, accomplished genetic sequence identification via the creation of curated databases containing a relatively small set of genetic data specific to a species or group. The process for creation of the custom database included multiple steps beginning with identification of highly conserved start and end sequences and intervening sequence validation parameters. The process eliminated the need for multiple sequence alignment with GenBank sequences, whose information is valuable, yet difficult to properly utilize due to its size and quality. The custom database approach maximized application performance with minimal impact on analysis response time, allowing investigation of optimal sequences for identification of all Mycobacterium to the species level. In comparison to the 16S and ITS genetic regions, a curated ITS based approach proved most effective for identification of Mycobacterium isolates.",
author = "Dan Kuyper and Ali, {Hesham H} and Mohamed, {Amr M.} and Hinrichs, {Steven Heye}",
year = "2004",
month = "12",
day = "1",
language = "English (US)",
isbn = "0769521320",
series = "Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)",
pages = "2679--2685",
booktitle = "Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)",

}

TY - GEN

T1 - Identification of Mycobacterium species using curated custom databases

AU - Kuyper, Dan

AU - Ali, Hesham H

AU - Mohamed, Amr M.

AU - Hinrichs, Steven Heye

PY - 2004/12/1

Y1 - 2004/12/1

N2 - Advances in molecular biology have resulted in the development of diagnostic tests for infectious diseases based on genetic profiles. While probe based assays dominate the field today, sequence based assays hold great promise for the future. However, the variability in quality of sequence information currently present in public databases limits the potential growth and use of sequence based analysis. To address this problem a standardized method for DNA sequence validation and building of custom databases was developed using Mycobacterium as a development model. With this model, a computational approach to identification of infectious diseases was developed and evaluated. The web-based application, termed BioDatabase, accomplished genetic sequence identification via the creation of curated databases containing a relatively small set of genetic data specific to a species or group. The process for creation of the custom database included multiple steps beginning with identification of highly conserved start and end sequences and intervening sequence validation parameters. The process eliminated the need for multiple sequence alignment with GenBank sequences, whose information is valuable, yet difficult to properly utilize due to its size and quality. The custom database approach maximized application performance with minimal impact on analysis response time, allowing investigation of optimal sequences for identification of all Mycobacterium to the species level. In comparison to the 16S and ITS genetic regions, a curated ITS based approach proved most effective for identification of Mycobacterium isolates.

AB - Advances in molecular biology have resulted in the development of diagnostic tests for infectious diseases based on genetic profiles. While probe based assays dominate the field today, sequence based assays hold great promise for the future. However, the variability in quality of sequence information currently present in public databases limits the potential growth and use of sequence based analysis. To address this problem a standardized method for DNA sequence validation and building of custom databases was developed using Mycobacterium as a development model. With this model, a computational approach to identification of infectious diseases was developed and evaluated. The web-based application, termed BioDatabase, accomplished genetic sequence identification via the creation of curated databases containing a relatively small set of genetic data specific to a species or group. The process for creation of the custom database included multiple steps beginning with identification of highly conserved start and end sequences and intervening sequence validation parameters. The process eliminated the need for multiple sequence alignment with GenBank sequences, whose information is valuable, yet difficult to properly utilize due to its size and quality. The custom database approach maximized application performance with minimal impact on analysis response time, allowing investigation of optimal sequences for identification of all Mycobacterium to the species level. In comparison to the 16S and ITS genetic regions, a curated ITS based approach proved most effective for identification of Mycobacterium isolates.

UR - http://www.scopus.com/inward/record.url?scp=12444297543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12444297543&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:12444297543

SN - 0769521320

SN - 9780769521329

T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)

SP - 2679

EP - 2685

BT - Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)

ER -