Evaluating distributed platforms for protein-guided scientific workflow

Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S. Deogun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.

Original languageEnglish (US)
Title of host publicationProceedings of the XSEDE 2014 Conference
Subtitle of host publicationEngaging Communities
PublisherAssociation for Computing Machinery
ISBN (Print)9781450328937
DOIs
StatePublished - Jan 1 2014
Event2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014 - Atlanta, GA, United States
Duration: Jul 13 2014Jul 18 2014

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014
CountryUnited States
CityAtlanta, GA
Period7/13/147/18/14

    Fingerprint

Keywords

  • Amazon EC2
  • Blast2cap3
  • Campus cluster
  • Open science grid
  • Pegasus workflow management system
  • Scientific workflow

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Pavlovikj, N., Begcy, K., Behera, S., Campbell, M., Walia, H., & Deogun, J. S. (2014). Evaluating distributed platforms for protein-guided scientific workflow. In Proceedings of the XSEDE 2014 Conference: Engaging Communities [38] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/2616498.2616551