An energy-aware bioinformatics application for assembling short reads in high performance computing systems

Julia Warnke, Sachin Pawaskar, Hesham H Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Current biomedical technologies are producing massive amounts of data on an unprecedented scale. The increasing complexity and growth rate of biological data has made bioinformatics data processing and analysis a key and computationally intensive task. High performance computing (HPC) has been successfully applied to major bioinformatics applications to reduce computational burden. However, a naïve approach for developing parallel bioinformatics applications may achieve a high degree of parallelism while unnecessarily expending computational resources and consuming high levels of energy. As the wealth of biological data and associated computational burden continues to increase, there has become a need for the development of energy efficient computational approaches in the bioinformatics domain. To address this issue, we have developed an energy-aware scheduling (EAS) model to run computationally intensive applications that takes both deadline requirements and energy factors into consideration. An example of a computationally demanding process that would benefit from our scheduling model is the assembly of short sequencing reads produced by next generation sequencing technologies. Next generation sequencing produces a very large number of short DNA reads from a biological sample. Multiple overlapping fragments must be aligned and merged into long stretches of contiguous sequence before any useful information can be gathered. The assembly problem is extremely difficult due to the complex nature of underlying genome structure and inherent biological error present in current sequencing technologies. We apply our EAS model to a newly proposed assembly algorithm called Merge and Traverse, giving us the ability to generate speedup profiles. Our EAS model was also able to dynamically adjust the number of nodes needed to meet given deadlines for different sets of reads.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012
Pages154-160
Number of pages7
DOIs
StatePublished - Oct 8 2012
Event2012 10th Annual International Conference on High Performance Computing and Simulation, HPCS 2012 - Madrid, Spain
Duration: Jul 2 2012Jul 6 2012

Publication series

NameProceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012

Conference

Conference2012 10th Annual International Conference on High Performance Computing and Simulation, HPCS 2012
CountrySpain
CityMadrid
Period7/2/127/6/12

Fingerprint

Bioinformatics
High Performance
Scheduling
Sequencing
Computing
Energy
Deadline
Stretch
DNA
Energy Efficient
Genes
Model
Parallelism
Overlapping
Data analysis
Fragment
Genome
Speedup
Continue
Resources

Keywords

  • Energy aware scheduling
  • genome assembly
  • high performance computing
  • next generation sequencing

ASJC Scopus subject areas

  • Modeling and Simulation

Cite this

Warnke, J., Pawaskar, S., & Ali, H. H. (2012). An energy-aware bioinformatics application for assembling short reads in high performance computing systems. In Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012 (pp. 154-160). [6266905] (Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012). https://doi.org/10.1109/HPCSim.2012.6266905

An energy-aware bioinformatics application for assembling short reads in high performance computing systems. / Warnke, Julia; Pawaskar, Sachin; Ali, Hesham H.

Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012. 2012. p. 154-160 6266905 (Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Warnke, J, Pawaskar, S & Ali, HH 2012, An energy-aware bioinformatics application for assembling short reads in high performance computing systems. in Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012., 6266905, Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012, pp. 154-160, 2012 10th Annual International Conference on High Performance Computing and Simulation, HPCS 2012, Madrid, Spain, 7/2/12. https://doi.org/10.1109/HPCSim.2012.6266905
Warnke J, Pawaskar S, Ali HH. An energy-aware bioinformatics application for assembling short reads in high performance computing systems. In Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012. 2012. p. 154-160. 6266905. (Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012). https://doi.org/10.1109/HPCSim.2012.6266905
Warnke, Julia ; Pawaskar, Sachin ; Ali, Hesham H. / An energy-aware bioinformatics application for assembling short reads in high performance computing systems. Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012. 2012. pp. 154-160 (Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012).
@inproceedings{58558628d6ff415480b60e194235f56c,
title = "An energy-aware bioinformatics application for assembling short reads in high performance computing systems",
abstract = "Current biomedical technologies are producing massive amounts of data on an unprecedented scale. The increasing complexity and growth rate of biological data has made bioinformatics data processing and analysis a key and computationally intensive task. High performance computing (HPC) has been successfully applied to major bioinformatics applications to reduce computational burden. However, a na{\"i}ve approach for developing parallel bioinformatics applications may achieve a high degree of parallelism while unnecessarily expending computational resources and consuming high levels of energy. As the wealth of biological data and associated computational burden continues to increase, there has become a need for the development of energy efficient computational approaches in the bioinformatics domain. To address this issue, we have developed an energy-aware scheduling (EAS) model to run computationally intensive applications that takes both deadline requirements and energy factors into consideration. An example of a computationally demanding process that would benefit from our scheduling model is the assembly of short sequencing reads produced by next generation sequencing technologies. Next generation sequencing produces a very large number of short DNA reads from a biological sample. Multiple overlapping fragments must be aligned and merged into long stretches of contiguous sequence before any useful information can be gathered. The assembly problem is extremely difficult due to the complex nature of underlying genome structure and inherent biological error present in current sequencing technologies. We apply our EAS model to a newly proposed assembly algorithm called Merge and Traverse, giving us the ability to generate speedup profiles. Our EAS model was also able to dynamically adjust the number of nodes needed to meet given deadlines for different sets of reads.",
keywords = "Energy aware scheduling, genome assembly, high performance computing, next generation sequencing",
author = "Julia Warnke and Sachin Pawaskar and Ali, {Hesham H}",
year = "2012",
month = "10",
day = "8",
doi = "10.1109/HPCSim.2012.6266905",
language = "English (US)",
isbn = "9781467323598",
series = "Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012",
pages = "154--160",
booktitle = "Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012",

}

TY - GEN

T1 - An energy-aware bioinformatics application for assembling short reads in high performance computing systems

AU - Warnke, Julia

AU - Pawaskar, Sachin

AU - Ali, Hesham H

PY - 2012/10/8

Y1 - 2012/10/8

N2 - Current biomedical technologies are producing massive amounts of data on an unprecedented scale. The increasing complexity and growth rate of biological data has made bioinformatics data processing and analysis a key and computationally intensive task. High performance computing (HPC) has been successfully applied to major bioinformatics applications to reduce computational burden. However, a naïve approach for developing parallel bioinformatics applications may achieve a high degree of parallelism while unnecessarily expending computational resources and consuming high levels of energy. As the wealth of biological data and associated computational burden continues to increase, there has become a need for the development of energy efficient computational approaches in the bioinformatics domain. To address this issue, we have developed an energy-aware scheduling (EAS) model to run computationally intensive applications that takes both deadline requirements and energy factors into consideration. An example of a computationally demanding process that would benefit from our scheduling model is the assembly of short sequencing reads produced by next generation sequencing technologies. Next generation sequencing produces a very large number of short DNA reads from a biological sample. Multiple overlapping fragments must be aligned and merged into long stretches of contiguous sequence before any useful information can be gathered. The assembly problem is extremely difficult due to the complex nature of underlying genome structure and inherent biological error present in current sequencing technologies. We apply our EAS model to a newly proposed assembly algorithm called Merge and Traverse, giving us the ability to generate speedup profiles. Our EAS model was also able to dynamically adjust the number of nodes needed to meet given deadlines for different sets of reads.

AB - Current biomedical technologies are producing massive amounts of data on an unprecedented scale. The increasing complexity and growth rate of biological data has made bioinformatics data processing and analysis a key and computationally intensive task. High performance computing (HPC) has been successfully applied to major bioinformatics applications to reduce computational burden. However, a naïve approach for developing parallel bioinformatics applications may achieve a high degree of parallelism while unnecessarily expending computational resources and consuming high levels of energy. As the wealth of biological data and associated computational burden continues to increase, there has become a need for the development of energy efficient computational approaches in the bioinformatics domain. To address this issue, we have developed an energy-aware scheduling (EAS) model to run computationally intensive applications that takes both deadline requirements and energy factors into consideration. An example of a computationally demanding process that would benefit from our scheduling model is the assembly of short sequencing reads produced by next generation sequencing technologies. Next generation sequencing produces a very large number of short DNA reads from a biological sample. Multiple overlapping fragments must be aligned and merged into long stretches of contiguous sequence before any useful information can be gathered. The assembly problem is extremely difficult due to the complex nature of underlying genome structure and inherent biological error present in current sequencing technologies. We apply our EAS model to a newly proposed assembly algorithm called Merge and Traverse, giving us the ability to generate speedup profiles. Our EAS model was also able to dynamically adjust the number of nodes needed to meet given deadlines for different sets of reads.

KW - Energy aware scheduling

KW - genome assembly

KW - high performance computing

KW - next generation sequencing

UR - http://www.scopus.com/inward/record.url?scp=84867018924&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867018924&partnerID=8YFLogxK

U2 - 10.1109/HPCSim.2012.6266905

DO - 10.1109/HPCSim.2012.6266905

M3 - Conference contribution

AN - SCOPUS:84867018924

SN - 9781467323598

T3 - Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012

SP - 154

EP - 160

BT - Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012

ER -