Evaluating distributed platforms for protein-guided scientific workflow

Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S Deogun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.

Original languageEnglish (US)
Title of host publicationProceedings of the XSEDE 2014 Conference
Subtitle of host publicationEngaging Communities
PublisherAssociation for Computing Machinery
ISBN (Print)9781450328937
DOIs
StatePublished - Jan 1 2014
Event2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014 - Atlanta, GA, United States
Duration: Jul 13 2014Jul 18 2014

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014
CountryUnited States
CityAtlanta, GA
Period7/13/147/18/14

Fingerprint

Proteins
Costs
Availability
Experiments

Keywords

  • Amazon EC2
  • Blast2cap3
  • Campus cluster
  • Open science grid
  • Pegasus workflow management system
  • Scientific workflow

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Pavlovikj, N., Begcy, K., Behera, S., Campbell, M., Walia, H., & Deogun, J. S. (2014). Evaluating distributed platforms for protein-guided scientific workflow. In Proceedings of the XSEDE 2014 Conference: Engaging Communities [38] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/2616498.2616551

Evaluating distributed platforms for protein-guided scientific workflow. / Pavlovikj, Natasha; Begcy, Kevin; Behera, Sairam; Campbell, Malachy; Walia, Harkamal; Deogun, Jitender S.

Proceedings of the XSEDE 2014 Conference: Engaging Communities. Association for Computing Machinery, 2014. 38 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pavlovikj, N, Begcy, K, Behera, S, Campbell, M, Walia, H & Deogun, JS 2014, Evaluating distributed platforms for protein-guided scientific workflow. in Proceedings of the XSEDE 2014 Conference: Engaging Communities., 38, ACM International Conference Proceeding Series, Association for Computing Machinery, 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014, Atlanta, GA, United States, 7/13/14. https://doi.org/10.1145/2616498.2616551
Pavlovikj N, Begcy K, Behera S, Campbell M, Walia H, Deogun JS. Evaluating distributed platforms for protein-guided scientific workflow. In Proceedings of the XSEDE 2014 Conference: Engaging Communities. Association for Computing Machinery. 2014. 38. (ACM International Conference Proceeding Series). https://doi.org/10.1145/2616498.2616551
Pavlovikj, Natasha ; Begcy, Kevin ; Behera, Sairam ; Campbell, Malachy ; Walia, Harkamal ; Deogun, Jitender S. / Evaluating distributed platforms for protein-guided scientific workflow. Proceedings of the XSEDE 2014 Conference: Engaging Communities. Association for Computing Machinery, 2014. (ACM International Conference Proceeding Series).
@inproceedings{04ec9970b92b4e7cb7ffe91bbf662217,
title = "Evaluating distributed platforms for protein-guided scientific workflow",
abstract = "Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.",
keywords = "Amazon EC2, Blast2cap3, Campus cluster, Open science grid, Pegasus workflow management system, Scientific workflow",
author = "Natasha Pavlovikj and Kevin Begcy and Sairam Behera and Malachy Campbell and Harkamal Walia and Deogun, {Jitender S}",
year = "2014",
month = "1",
day = "1",
doi = "10.1145/2616498.2616551",
language = "English (US)",
isbn = "9781450328937",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings of the XSEDE 2014 Conference",

}

TY - GEN

T1 - Evaluating distributed platforms for protein-guided scientific workflow

AU - Pavlovikj, Natasha

AU - Begcy, Kevin

AU - Behera, Sairam

AU - Campbell, Malachy

AU - Walia, Harkamal

AU - Deogun, Jitender S

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.

AB - Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.

KW - Amazon EC2

KW - Blast2cap3

KW - Campus cluster

KW - Open science grid

KW - Pegasus workflow management system

KW - Scientific workflow

UR - http://www.scopus.com/inward/record.url?scp=84905482237&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905482237&partnerID=8YFLogxK

U2 - 10.1145/2616498.2616551

DO - 10.1145/2616498.2616551

M3 - Conference contribution

SN - 9781450328937

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the XSEDE 2014 Conference

PB - Association for Computing Machinery

ER -