A case study of parallel I/O for biological sequence search on Linux clusters

Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

In this paper we analyze the I/O access patterns of a widely-used biological sequence search tool and implement two variations that employ parallel-I/O for data access based on PVFS (Parallel Virtual File System) and CEFT-PVFS (Cost-Effective Fault-Tolerant PVFS). Experiments show that the two variations outperform the original tool when equal or even fewer storage devices are used in the former. It is also found that although the performance of the two variations improves consistently when initially increasing the number of servers, this performance gain from parallel I/O becomes insignificant with further increase in server number. We examine the effectiveness of two read performance optimization techniques in CEFT-PVFS by using this tool as a benchmark. Performance results indicate: (1) Doubling the degree of parallelism boosts the read performance to approach that of PVFS; (2) Skipping hotspots can substantially improve the I/O performance when the load on data servers is highly imbalanced. The I/O resource contention due to the sharing of server nodes by multiple applications in a cluster has been shown to degrade the performance of the original tool and the variation based on PVFS by up to 10 and 21 folds, respectively; whereas, the variation based on CEFT-PVFS only suffered a two-fold performance degradation.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages308-315
Number of pages8
ISBN (Electronic)0769520669
DOIs
StatePublished - Jan 1 2003
EventIEEE International Conference on Cluster Computing, CLUSTER 2003 - Hong Kong, China
Duration: Dec 1 2003Dec 4 2003

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2003-January
ISSN (Print)1552-5244

Other

OtherIEEE International Conference on Cluster Computing, CLUSTER 2003
CountryChina
CityHong Kong
Period12/1/0312/4/03

Fingerprint

Servers
Costs
Degradation
Linux
Experiments

Keywords

  • BLAST
  • CEFT-PVFS
  • PVFS
  • Parallel I/O

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Cite this

Zhu, Y., Jiang, H., Qin, X., & Swanson, D. (2003). A case study of parallel I/O for biological sequence search on Linux clusters. In Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003 (pp. 308-315). [1253329] (Proceedings - IEEE International Conference on Cluster Computing, ICCC; Vol. 2003-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CLUSTR.2003.1253329

A case study of parallel I/O for biological sequence search on Linux clusters. / Zhu, Yifeng; Jiang, Hong; Qin, Xiao; Swanson, David.

Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003. Institute of Electrical and Electronics Engineers Inc., 2003. p. 308-315 1253329 (Proceedings - IEEE International Conference on Cluster Computing, ICCC; Vol. 2003-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhu, Y, Jiang, H, Qin, X & Swanson, D 2003, A case study of parallel I/O for biological sequence search on Linux clusters. in Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003., 1253329, Proceedings - IEEE International Conference on Cluster Computing, ICCC, vol. 2003-January, Institute of Electrical and Electronics Engineers Inc., pp. 308-315, IEEE International Conference on Cluster Computing, CLUSTER 2003, Hong Kong, China, 12/1/03. https://doi.org/10.1109/CLUSTR.2003.1253329
Zhu Y, Jiang H, Qin X, Swanson D. A case study of parallel I/O for biological sequence search on Linux clusters. In Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003. Institute of Electrical and Electronics Engineers Inc. 2003. p. 308-315. 1253329. (Proceedings - IEEE International Conference on Cluster Computing, ICCC). https://doi.org/10.1109/CLUSTR.2003.1253329
Zhu, Yifeng ; Jiang, Hong ; Qin, Xiao ; Swanson, David. / A case study of parallel I/O for biological sequence search on Linux clusters. Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003. Institute of Electrical and Electronics Engineers Inc., 2003. pp. 308-315 (Proceedings - IEEE International Conference on Cluster Computing, ICCC).
@inproceedings{4e6dc4fcfbe94219896917a88710eaf2,
title = "A case study of parallel I/O for biological sequence search on Linux clusters",
abstract = "In this paper we analyze the I/O access patterns of a widely-used biological sequence search tool and implement two variations that employ parallel-I/O for data access based on PVFS (Parallel Virtual File System) and CEFT-PVFS (Cost-Effective Fault-Tolerant PVFS). Experiments show that the two variations outperform the original tool when equal or even fewer storage devices are used in the former. It is also found that although the performance of the two variations improves consistently when initially increasing the number of servers, this performance gain from parallel I/O becomes insignificant with further increase in server number. We examine the effectiveness of two read performance optimization techniques in CEFT-PVFS by using this tool as a benchmark. Performance results indicate: (1) Doubling the degree of parallelism boosts the read performance to approach that of PVFS; (2) Skipping hotspots can substantially improve the I/O performance when the load on data servers is highly imbalanced. The I/O resource contention due to the sharing of server nodes by multiple applications in a cluster has been shown to degrade the performance of the original tool and the variation based on PVFS by up to 10 and 21 folds, respectively; whereas, the variation based on CEFT-PVFS only suffered a two-fold performance degradation.",
keywords = "BLAST, CEFT-PVFS, PVFS, Parallel I/O",
author = "Yifeng Zhu and Hong Jiang and Xiao Qin and David Swanson",
year = "2003",
month = "1",
day = "1",
doi = "10.1109/CLUSTR.2003.1253329",
language = "English (US)",
series = "Proceedings - IEEE International Conference on Cluster Computing, ICCC",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "308--315",
booktitle = "Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003",

}

TY - GEN

T1 - A case study of parallel I/O for biological sequence search on Linux clusters

AU - Zhu, Yifeng

AU - Jiang, Hong

AU - Qin, Xiao

AU - Swanson, David

PY - 2003/1/1

Y1 - 2003/1/1

N2 - In this paper we analyze the I/O access patterns of a widely-used biological sequence search tool and implement two variations that employ parallel-I/O for data access based on PVFS (Parallel Virtual File System) and CEFT-PVFS (Cost-Effective Fault-Tolerant PVFS). Experiments show that the two variations outperform the original tool when equal or even fewer storage devices are used in the former. It is also found that although the performance of the two variations improves consistently when initially increasing the number of servers, this performance gain from parallel I/O becomes insignificant with further increase in server number. We examine the effectiveness of two read performance optimization techniques in CEFT-PVFS by using this tool as a benchmark. Performance results indicate: (1) Doubling the degree of parallelism boosts the read performance to approach that of PVFS; (2) Skipping hotspots can substantially improve the I/O performance when the load on data servers is highly imbalanced. The I/O resource contention due to the sharing of server nodes by multiple applications in a cluster has been shown to degrade the performance of the original tool and the variation based on PVFS by up to 10 and 21 folds, respectively; whereas, the variation based on CEFT-PVFS only suffered a two-fold performance degradation.

AB - In this paper we analyze the I/O access patterns of a widely-used biological sequence search tool and implement two variations that employ parallel-I/O for data access based on PVFS (Parallel Virtual File System) and CEFT-PVFS (Cost-Effective Fault-Tolerant PVFS). Experiments show that the two variations outperform the original tool when equal or even fewer storage devices are used in the former. It is also found that although the performance of the two variations improves consistently when initially increasing the number of servers, this performance gain from parallel I/O becomes insignificant with further increase in server number. We examine the effectiveness of two read performance optimization techniques in CEFT-PVFS by using this tool as a benchmark. Performance results indicate: (1) Doubling the degree of parallelism boosts the read performance to approach that of PVFS; (2) Skipping hotspots can substantially improve the I/O performance when the load on data servers is highly imbalanced. The I/O resource contention due to the sharing of server nodes by multiple applications in a cluster has been shown to degrade the performance of the original tool and the variation based on PVFS by up to 10 and 21 folds, respectively; whereas, the variation based on CEFT-PVFS only suffered a two-fold performance degradation.

KW - BLAST

KW - CEFT-PVFS

KW - PVFS

KW - Parallel I/O

UR - http://www.scopus.com/inward/record.url?scp=84906484524&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906484524&partnerID=8YFLogxK

U2 - 10.1109/CLUSTR.2003.1253329

DO - 10.1109/CLUSTR.2003.1253329

M3 - Conference contribution

AN - SCOPUS:84906484524

T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC

SP - 308

EP - 315

BT - Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 2003

PB - Institute of Electrical and Electronics Engineers Inc.

ER -