Discovering job preemptions in the open science grid

Zhe Zhang, Derek Weitzel, Brian Bockelman, David Swanson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Open Science Grid(OSG)[9] is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density function of job runtime for different classes.

Original languageEnglish (US)
Title of host publicationPractice and Experience in Advanced Research Computing 2018
Subtitle of host publicationSeamless Creativity, PEARC 2018
PublisherAssociation for Computing Machinery
ISBN (Print)9781450364461
DOIs
StatePublished - Jul 22 2018
Event2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018 - Pittsburgh, United States
Duration: Jul 22 2017Jul 26 2017

Publication series

NameACM International Conference Proceeding Series

Other

Other2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018
CountryUnited States
CityPittsburgh
Period7/22/177/26/17

Fingerprint

Distributed computer systems
Probability density function
Statistics

Keywords

  • Distribution
  • Estimation
  • Failure pattern
  • Job failure
  • Job runtime
  • OSG
  • Pilot job
  • Preemption
  • Probability Density Function
  • Spatial locality
  • Temporal locality

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Zhang, Z., Weitzel, D., Bockelman, B., & Swanson, D. (2018). Discovering job preemptions in the open science grid. In Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018 [a59] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3219104.3229282

Discovering job preemptions in the open science grid. / Zhang, Zhe; Weitzel, Derek; Bockelman, Brian; Swanson, David.

Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018. Association for Computing Machinery, 2018. a59 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, Z, Weitzel, D, Bockelman, B & Swanson, D 2018, Discovering job preemptions in the open science grid. in Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018., a59, ACM International Conference Proceeding Series, Association for Computing Machinery, 2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018, Pittsburgh, United States, 7/22/17. https://doi.org/10.1145/3219104.3229282
Zhang Z, Weitzel D, Bockelman B, Swanson D. Discovering job preemptions in the open science grid. In Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018. Association for Computing Machinery. 2018. a59. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3219104.3229282
Zhang, Zhe ; Weitzel, Derek ; Bockelman, Brian ; Swanson, David. / Discovering job preemptions in the open science grid. Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).
@inproceedings{57e119f2dd49405386259c90d40f13a1,
title = "Discovering job preemptions in the open science grid",
abstract = "The Open Science Grid(OSG)[9] is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density function of job runtime for different classes.",
keywords = "Distribution, Estimation, Failure pattern, Job failure, Job runtime, OSG, Pilot job, Preemption, Probability Density Function, Spatial locality, Temporal locality",
author = "Zhe Zhang and Derek Weitzel and Brian Bockelman and David Swanson",
year = "2018",
month = "7",
day = "22",
doi = "10.1145/3219104.3229282",
language = "English (US)",
isbn = "9781450364461",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Practice and Experience in Advanced Research Computing 2018",

}

TY - GEN

T1 - Discovering job preemptions in the open science grid

AU - Zhang, Zhe

AU - Weitzel, Derek

AU - Bockelman, Brian

AU - Swanson, David

PY - 2018/7/22

Y1 - 2018/7/22

N2 - The Open Science Grid(OSG)[9] is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density function of job runtime for different classes.

AB - The Open Science Grid(OSG)[9] is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density function of job runtime for different classes.

KW - Distribution

KW - Estimation

KW - Failure pattern

KW - Job failure

KW - Job runtime

KW - OSG

KW - Pilot job

KW - Preemption

KW - Probability Density Function

KW - Spatial locality

KW - Temporal locality

UR - http://www.scopus.com/inward/record.url?scp=85051444525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051444525&partnerID=8YFLogxK

U2 - 10.1145/3219104.3229282

DO - 10.1145/3219104.3229282

M3 - Conference contribution

AN - SCOPUS:85051444525

SN - 9781450364461

T3 - ACM International Conference Proceeding Series

BT - Practice and Experience in Advanced Research Computing 2018

PB - Association for Computing Machinery

ER -