FlexIO: I/O middleware for location-flexible scientific data analytics

Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan Anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, Hongfeng Yu

Research output: Contribution to conferencePaper

43 Citations (Scopus)

Abstract

Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations are running and before storing data on disk. There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored on persistent storage. Since different placements have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. The FlexIO middleware described in this paper makes it easy for scientists to obtain such flexibility, by offering simple abstractions and diverse data movement methods to couple simulation with analytics. Various placement policies can be built on top of FlexIO to exploit the trade-offs in performing analytics at different levels of the I/O hierarchy. Experimental results demonstrate that FlexIO can support a variety of simulation and analytics workloads at large scale through flexible placement options, efficient data movement, and dynamic deployment of data manipulation functionalities.

Original languageEnglish (US)
Pages320-331
Number of pages12
DOIs
StatePublished - Oct 7 2013
Event27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 - Boston, MA, United States
Duration: May 20 2013May 24 2013

Other

Other27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013
CountryUnited States
CityBoston, MA
Period5/20/135/24/13

Fingerprint

Middleware
Costs

Keywords

  • Flexibility
  • I/O
  • In Situ Data Analytics
  • Placemen

ASJC Scopus subject areas

  • Software

Cite this

Zheng, F., Zou, H., Eisenhauer, G., Schwan, K., Wolf, M., Dayal, J., ... Yu, H. (2013). FlexIO: I/O middleware for location-flexible scientific data analytics. 320-331. Paper presented at 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013, Boston, MA, United States. https://doi.org/10.1109/IPDPS.2013.46

FlexIO : I/O middleware for location-flexible scientific data analytics. / Zheng, Fang; Zou, Hongbo; Eisenhauer, Greg; Schwan, Karsten; Wolf, Matthew; Dayal, Jai; Nguyen, Tuan Anh; Cao, Jianting; Abbasi, Hasan; Klasky, Scott; Podhorszki, Norbert; Yu, Hongfeng.

2013. 320-331 Paper presented at 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013, Boston, MA, United States.

Research output: Contribution to conferencePaper

Zheng, F, Zou, H, Eisenhauer, G, Schwan, K, Wolf, M, Dayal, J, Nguyen, TA, Cao, J, Abbasi, H, Klasky, S, Podhorszki, N & Yu, H 2013, 'FlexIO: I/O middleware for location-flexible scientific data analytics', Paper presented at 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013, Boston, MA, United States, 5/20/13 - 5/24/13 pp. 320-331. https://doi.org/10.1109/IPDPS.2013.46
Zheng F, Zou H, Eisenhauer G, Schwan K, Wolf M, Dayal J et al. FlexIO: I/O middleware for location-flexible scientific data analytics. 2013. Paper presented at 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013, Boston, MA, United States. https://doi.org/10.1109/IPDPS.2013.46
Zheng, Fang ; Zou, Hongbo ; Eisenhauer, Greg ; Schwan, Karsten ; Wolf, Matthew ; Dayal, Jai ; Nguyen, Tuan Anh ; Cao, Jianting ; Abbasi, Hasan ; Klasky, Scott ; Podhorszki, Norbert ; Yu, Hongfeng. / FlexIO : I/O middleware for location-flexible scientific data analytics. Paper presented at 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013, Boston, MA, United States.12 p.
@conference{06b1d8be0c9b4025adf47282b31cea1a,
title = "FlexIO: I/O middleware for location-flexible scientific data analytics",
abstract = "Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations are running and before storing data on disk. There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored on persistent storage. Since different placements have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. The FlexIO middleware described in this paper makes it easy for scientists to obtain such flexibility, by offering simple abstractions and diverse data movement methods to couple simulation with analytics. Various placement policies can be built on top of FlexIO to exploit the trade-offs in performing analytics at different levels of the I/O hierarchy. Experimental results demonstrate that FlexIO can support a variety of simulation and analytics workloads at large scale through flexible placement options, efficient data movement, and dynamic deployment of data manipulation functionalities.",
keywords = "Flexibility, I/O, In Situ Data Analytics, Placemen",
author = "Fang Zheng and Hongbo Zou and Greg Eisenhauer and Karsten Schwan and Matthew Wolf and Jai Dayal and Nguyen, {Tuan Anh} and Jianting Cao and Hasan Abbasi and Scott Klasky and Norbert Podhorszki and Hongfeng Yu",
year = "2013",
month = "10",
day = "7",
doi = "10.1109/IPDPS.2013.46",
language = "English (US)",
pages = "320--331",
note = "27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 ; Conference date: 20-05-2013 Through 24-05-2013",

}

TY - CONF

T1 - FlexIO

T2 - I/O middleware for location-flexible scientific data analytics

AU - Zheng, Fang

AU - Zou, Hongbo

AU - Eisenhauer, Greg

AU - Schwan, Karsten

AU - Wolf, Matthew

AU - Dayal, Jai

AU - Nguyen, Tuan Anh

AU - Cao, Jianting

AU - Abbasi, Hasan

AU - Klasky, Scott

AU - Podhorszki, Norbert

AU - Yu, Hongfeng

PY - 2013/10/7

Y1 - 2013/10/7

N2 - Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations are running and before storing data on disk. There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored on persistent storage. Since different placements have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. The FlexIO middleware described in this paper makes it easy for scientists to obtain such flexibility, by offering simple abstractions and diverse data movement methods to couple simulation with analytics. Various placement policies can be built on top of FlexIO to exploit the trade-offs in performing analytics at different levels of the I/O hierarchy. Experimental results demonstrate that FlexIO can support a variety of simulation and analytics workloads at large scale through flexible placement options, efficient data movement, and dynamic deployment of data manipulation functionalities.

AB - Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations are running and before storing data on disk. There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored on persistent storage. Since different placements have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. The FlexIO middleware described in this paper makes it easy for scientists to obtain such flexibility, by offering simple abstractions and diverse data movement methods to couple simulation with analytics. Various placement policies can be built on top of FlexIO to exploit the trade-offs in performing analytics at different levels of the I/O hierarchy. Experimental results demonstrate that FlexIO can support a variety of simulation and analytics workloads at large scale through flexible placement options, efficient data movement, and dynamic deployment of data manipulation functionalities.

KW - Flexibility

KW - I/O

KW - In Situ Data Analytics

KW - Placemen

UR - http://www.scopus.com/inward/record.url?scp=84884844639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84884844639&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2013.46

DO - 10.1109/IPDPS.2013.46

M3 - Paper

AN - SCOPUS:84884844639

SP - 320

EP - 331

ER -