On the integration of assembly and non-assembly approaches for comparing biological sequences

Vi Dam, Hesham H Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2232-2234
Number of pages3
ISBN (Electronic)9781509030491
DOIs
StatePublished - Dec 15 2017
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Volume2017-January

Other

Other2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
CountryUnited States
CityKansas City
Period11/13/1711/16/17

Fingerprint

Computational Biology
Bioinformatics
Information Storage and Retrieval
Genome-Wide Association Study
Phylogeny
Biomedical Research
Research Personnel
Genome
Technology
Genes
Datasets

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Dam, V., & Ali, H. H. (2017). On the integration of assembly and non-assembly approaches for comparing biological sequences. In I. Yoo, J. H. Zheng, Y. Gong, X. T. Hu, C-R. Shyu, Y. Bromberg, J. Gao, ... D. Korkin (Eds.), Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 (pp. 2232-2234). (Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017; Vol. 2017-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM.2017.8218007

On the integration of assembly and non-assembly approaches for comparing biological sequences. / Dam, Vi; Ali, Hesham H.

Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. ed. / Illhoi Yoo; Jane Huiru Zheng; Yang Gong; Xiaohua Tony Hu; Chi-Ren Shyu; Yana Bromberg; Jean Gao; Dmitry Korkin. Institute of Electrical and Electronics Engineers Inc., 2017. p. 2232-2234 (Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017; Vol. 2017-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dam, V & Ali, HH 2017, On the integration of assembly and non-assembly approaches for comparing biological sequences. in I Yoo, JH Zheng, Y Gong, XT Hu, C-R Shyu, Y Bromberg, J Gao & D Korkin (eds), Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017, vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 2232-2234, 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017, Kansas City, United States, 11/13/17. https://doi.org/10.1109/BIBM.2017.8218007
Dam V, Ali HH. On the integration of assembly and non-assembly approaches for comparing biological sequences. In Yoo I, Zheng JH, Gong Y, Hu XT, Shyu C-R, Bromberg Y, Gao J, Korkin D, editors, Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 2232-2234. (Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017). https://doi.org/10.1109/BIBM.2017.8218007
Dam, Vi ; Ali, Hesham H. / On the integration of assembly and non-assembly approaches for comparing biological sequences. Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. editor / Illhoi Yoo ; Jane Huiru Zheng ; Yang Gong ; Xiaohua Tony Hu ; Chi-Ren Shyu ; Yana Bromberg ; Jean Gao ; Dmitry Korkin. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 2232-2234 (Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017).
@inproceedings{af87dd855a434ecfae9efa9c913a701e,
title = "On the integration of assembly and non-assembly approaches for comparing biological sequences",
abstract = "As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.",
author = "Vi Dam and Ali, {Hesham H}",
year = "2017",
month = "12",
day = "15",
doi = "10.1109/BIBM.2017.8218007",
language = "English (US)",
series = "Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2232--2234",
editor = "Illhoi Yoo and Zheng, {Jane Huiru} and Yang Gong and Hu, {Xiaohua Tony} and Chi-Ren Shyu and Yana Bromberg and Jean Gao and Dmitry Korkin",
booktitle = "Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017",

}

TY - GEN

T1 - On the integration of assembly and non-assembly approaches for comparing biological sequences

AU - Dam, Vi

AU - Ali, Hesham H

PY - 2017/12/15

Y1 - 2017/12/15

N2 - As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.

AB - As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.

UR - http://www.scopus.com/inward/record.url?scp=85045961520&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045961520&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2017.8218007

DO - 10.1109/BIBM.2017.8218007

M3 - Conference contribution

AN - SCOPUS:85045961520

T3 - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017

SP - 2232

EP - 2234

BT - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017

A2 - Yoo, Illhoi

A2 - Zheng, Jane Huiru

A2 - Gong, Yang

A2 - Hu, Xiaohua Tony

A2 - Shyu, Chi-Ren

A2 - Bromberg, Yana

A2 - Gao, Jean

A2 - Korkin, Dmitry

PB - Institute of Electrical and Electronics Engineers Inc.

ER -