A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system

Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S. Deogun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Scientific workflows are a useful tool for managing large and complex computational tasks. Due to its intensive resource requirements, the scientific workflows are often executed on distributed platforms, including campus clusters, grids and clouds. In this paper we build a scientific workflow for blast2cap3, the protein-guided assembly, using the Pegasus Workflow Management System (Pegasus WMS). The modularity of blast2cap3 allows us to decompose the existing serial approach on multiple tasks, some of which can be run in parallel. Afterwards, this workflow is deployed on two distributed execution platforms: Sandhills, the University of Nebraska Campus Cluster, and the Open Science Grid (OSG). We compare and evaluate the performance of the built workflow for the both platforms. Furthermore, we also investigate the influence of the number of clusters of transcripts in the blast2cap3 workflow over the total running time. The performed experiments show that the Pegasus WMS implementation of blast2cap3 significantly reduces the running time compared to the current serial implementation of blast2cap3 for more than 95 %. Although OSG provides more computational resources than Sandhills, our workflow experimental runs have better running time on Sandhills. Moreover, the selection of 300 clusters of transcripts gives the optimum performance with the resources allocated from Sandhills.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
PublisherIEEE Computer Society
Pages546-555
Number of pages10
ISBN (Electronic)9780769552088
DOIs
StatePublished - Nov 27 2014
Event28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014 - Phoenix, United States
Duration: May 19 2014May 23 2014

Publication series

NameProceedings of the International Parallel and Distributed Processing Symposium, IPDPS
ISSN (Print)1530-2075
ISSN (Electronic)2332-1237

Conference

Conference28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
CountryUnited States
CityPhoenix
Period5/19/145/23/14

Fingerprint

Proteins
Experiments

Keywords

  • Blast2cap3
  • Campus cluster
  • Open science grid
  • Pegasus workflow management system
  • Protein-guided assembly
  • Scientific workflow
  • Transcriptome assembly

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Pavlovikj, N., Begcy, K., Behera, S., Campbell, M., Walia, H., & Deogun, J. S. (2014). A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system. In Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014 (pp. 546-555). [6969434] (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS). IEEE Computer Society. https://doi.org/10.1109/IPDPSW.2014.66

A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system. / Pavlovikj, Natasha; Begcy, Kevin; Behera, Sairam; Campbell, Malachy; Walia, Harkamal; Deogun, Jitender S.

Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Computer Society, 2014. p. 546-555 6969434 (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pavlovikj, N, Begcy, K, Behera, S, Campbell, M, Walia, H & Deogun, JS 2014, A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system. in Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014., 6969434, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, IEEE Computer Society, pp. 546-555, 28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014, Phoenix, United States, 5/19/14. https://doi.org/10.1109/IPDPSW.2014.66
Pavlovikj N, Begcy K, Behera S, Campbell M, Walia H, Deogun JS. A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system. In Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Computer Society. 2014. p. 546-555. 6969434. (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS). https://doi.org/10.1109/IPDPSW.2014.66
Pavlovikj, Natasha ; Begcy, Kevin ; Behera, Sairam ; Campbell, Malachy ; Walia, Harkamal ; Deogun, Jitender S. / A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system. Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Computer Society, 2014. pp. 546-555 (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS).
@inproceedings{5251ed2a207b49c08263057e8d8a53b0,
title = "A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system",
abstract = "Scientific workflows are a useful tool for managing large and complex computational tasks. Due to its intensive resource requirements, the scientific workflows are often executed on distributed platforms, including campus clusters, grids and clouds. In this paper we build a scientific workflow for blast2cap3, the protein-guided assembly, using the Pegasus Workflow Management System (Pegasus WMS). The modularity of blast2cap3 allows us to decompose the existing serial approach on multiple tasks, some of which can be run in parallel. Afterwards, this workflow is deployed on two distributed execution platforms: Sandhills, the University of Nebraska Campus Cluster, and the Open Science Grid (OSG). We compare and evaluate the performance of the built workflow for the both platforms. Furthermore, we also investigate the influence of the number of clusters of transcripts in the blast2cap3 workflow over the total running time. The performed experiments show that the Pegasus WMS implementation of blast2cap3 significantly reduces the running time compared to the current serial implementation of blast2cap3 for more than 95 {\%}. Although OSG provides more computational resources than Sandhills, our workflow experimental runs have better running time on Sandhills. Moreover, the selection of 300 clusters of transcripts gives the optimum performance with the resources allocated from Sandhills.",
keywords = "Blast2cap3, Campus cluster, Open science grid, Pegasus workflow management system, Protein-guided assembly, Scientific workflow, Transcriptome assembly",
author = "Natasha Pavlovikj and Kevin Begcy and Sairam Behera and Malachy Campbell and Harkamal Walia and Deogun, {Jitender S.}",
year = "2014",
month = "11",
day = "27",
doi = "10.1109/IPDPSW.2014.66",
language = "English (US)",
series = "Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS",
publisher = "IEEE Computer Society",
pages = "546--555",
booktitle = "Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014",

}

TY - GEN

T1 - A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system

AU - Pavlovikj, Natasha

AU - Begcy, Kevin

AU - Behera, Sairam

AU - Campbell, Malachy

AU - Walia, Harkamal

AU - Deogun, Jitender S.

PY - 2014/11/27

Y1 - 2014/11/27

N2 - Scientific workflows are a useful tool for managing large and complex computational tasks. Due to its intensive resource requirements, the scientific workflows are often executed on distributed platforms, including campus clusters, grids and clouds. In this paper we build a scientific workflow for blast2cap3, the protein-guided assembly, using the Pegasus Workflow Management System (Pegasus WMS). The modularity of blast2cap3 allows us to decompose the existing serial approach on multiple tasks, some of which can be run in parallel. Afterwards, this workflow is deployed on two distributed execution platforms: Sandhills, the University of Nebraska Campus Cluster, and the Open Science Grid (OSG). We compare and evaluate the performance of the built workflow for the both platforms. Furthermore, we also investigate the influence of the number of clusters of transcripts in the blast2cap3 workflow over the total running time. The performed experiments show that the Pegasus WMS implementation of blast2cap3 significantly reduces the running time compared to the current serial implementation of blast2cap3 for more than 95 %. Although OSG provides more computational resources than Sandhills, our workflow experimental runs have better running time on Sandhills. Moreover, the selection of 300 clusters of transcripts gives the optimum performance with the resources allocated from Sandhills.

AB - Scientific workflows are a useful tool for managing large and complex computational tasks. Due to its intensive resource requirements, the scientific workflows are often executed on distributed platforms, including campus clusters, grids and clouds. In this paper we build a scientific workflow for blast2cap3, the protein-guided assembly, using the Pegasus Workflow Management System (Pegasus WMS). The modularity of blast2cap3 allows us to decompose the existing serial approach on multiple tasks, some of which can be run in parallel. Afterwards, this workflow is deployed on two distributed execution platforms: Sandhills, the University of Nebraska Campus Cluster, and the Open Science Grid (OSG). We compare and evaluate the performance of the built workflow for the both platforms. Furthermore, we also investigate the influence of the number of clusters of transcripts in the blast2cap3 workflow over the total running time. The performed experiments show that the Pegasus WMS implementation of blast2cap3 significantly reduces the running time compared to the current serial implementation of blast2cap3 for more than 95 %. Although OSG provides more computational resources than Sandhills, our workflow experimental runs have better running time on Sandhills. Moreover, the selection of 300 clusters of transcripts gives the optimum performance with the resources allocated from Sandhills.

KW - Blast2cap3

KW - Campus cluster

KW - Open science grid

KW - Pegasus workflow management system

KW - Protein-guided assembly

KW - Scientific workflow

KW - Transcriptome assembly

UR - http://www.scopus.com/inward/record.url?scp=84918823980&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84918823980&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2014.66

DO - 10.1109/IPDPSW.2014.66

M3 - Conference contribution

AN - SCOPUS:84918823980

T3 - Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

SP - 546

EP - 555

BT - Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014

PB - IEEE Computer Society

ER -