Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

Janine C. Bennett, Hasan Abbasi, Peer Timo Bremer, Ray Grout, Attila Gyulassy, Tong Jin, Scott Klasky, Hemanth Kolla, Manish Parashar, Valerio Pascucci, Philippe Pebay, David Thompson, Hongfeng Yu, Fan Zhang, Jacqueline Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

98 Citations (Scopus)

Abstract

With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

Original languageEnglish (US)
Title of host publication2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
DOIs
StatePublished - Dec 1 2012
Event2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012 - Salt Lake City, UT, United States
Duration: Nov 10 2012Nov 16 2012

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Other

Other2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
CountryUnited States
CitySalt Lake City, UT
Period11/10/1211/16/12

Fingerprint

Processing
Data transfer
Visualization
Pipelines
Scheduling
Statistics
Costs

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Bennett, J. C., Abbasi, H., Bremer, P. T., Grout, R., Gyulassy, A., Jin, T., ... Chen, J. (2012). Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012 [6468528] (International Conference for High Performance Computing, Networking, Storage and Analysis, SC). https://doi.org/10.1109/SC.2012.31

Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. / Bennett, Janine C.; Abbasi, Hasan; Bremer, Peer Timo; Grout, Ray; Gyulassy, Attila; Jin, Tong; Klasky, Scott; Kolla, Hemanth; Parashar, Manish; Pascucci, Valerio; Pebay, Philippe; Thompson, David; Yu, Hongfeng; Zhang, Fan; Chen, Jacqueline.

2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012. 2012. 6468528 (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bennett, JC, Abbasi, H, Bremer, PT, Grout, R, Gyulassy, A, Jin, T, Klasky, S, Kolla, H, Parashar, M, Pascucci, V, Pebay, P, Thompson, D, Yu, H, Zhang, F & Chen, J 2012, Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. in 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012., 6468528, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, 2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, Salt Lake City, UT, United States, 11/10/12. https://doi.org/10.1109/SC.2012.31
Bennett JC, Abbasi H, Bremer PT, Grout R, Gyulassy A, Jin T et al. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012. 2012. 6468528. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC). https://doi.org/10.1109/SC.2012.31
Bennett, Janine C. ; Abbasi, Hasan ; Bremer, Peer Timo ; Grout, Ray ; Gyulassy, Attila ; Jin, Tong ; Klasky, Scott ; Kolla, Hemanth ; Parashar, Manish ; Pascucci, Valerio ; Pebay, Philippe ; Thompson, David ; Yu, Hongfeng ; Zhang, Fan ; Chen, Jacqueline. / Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012. 2012. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).
@inproceedings{25702cfc2bd54f80a7f951c458052719,
title = "Combining in-situ and in-transit processing to enable extreme-scale scientific analysis",
abstract = "With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.",
author = "Bennett, {Janine C.} and Hasan Abbasi and Bremer, {Peer Timo} and Ray Grout and Attila Gyulassy and Tong Jin and Scott Klasky and Hemanth Kolla and Manish Parashar and Valerio Pascucci and Philippe Pebay and David Thompson and Hongfeng Yu and Fan Zhang and Jacqueline Chen",
year = "2012",
month = "12",
day = "1",
doi = "10.1109/SC.2012.31",
language = "English (US)",
isbn = "9781467308069",
series = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",
booktitle = "2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012",

}

TY - GEN

T1 - Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

AU - Bennett, Janine C.

AU - Abbasi, Hasan

AU - Bremer, Peer Timo

AU - Grout, Ray

AU - Gyulassy, Attila

AU - Jin, Tong

AU - Klasky, Scott

AU - Kolla, Hemanth

AU - Parashar, Manish

AU - Pascucci, Valerio

AU - Pebay, Philippe

AU - Thompson, David

AU - Yu, Hongfeng

AU - Zhang, Fan

AU - Chen, Jacqueline

PY - 2012/12/1

Y1 - 2012/12/1

N2 - With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

AB - With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

UR - http://www.scopus.com/inward/record.url?scp=84877687430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877687430&partnerID=8YFLogxK

U2 - 10.1109/SC.2012.31

DO - 10.1109/SC.2012.31

M3 - Conference contribution

SN - 9781467308069

T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

BT - 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012

ER -