Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system

Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David R. Swanson

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less load among each mirroring pair. The duplication of modified data to the mirroring nodes is performed asynchronously in the background. The read performance is improved by two techniques: doubling the degree of parallelism and hot-spot skipping. A synthetic benchmark is used to evaluate these algorithms in a real cluster environment and the proposed algorithms are shown to be very effective in performance enhancement.

Original languageEnglish (US)
Pages (from-to)433-447
Number of pages15
JournalCluster Computing
Volume9
Issue number4
DOIs
StatePublished - Oct 1 2006

Fingerprint

Redundancy
Cluster computing
Degradation
Heuristic algorithms
Scheduling algorithms
Throughput
Data storage equipment
Costs

Keywords

  • CEFT
  • Cluster computing
  • Clusterfile systems
  • Data storage
  • PVFS
  • RAID
  • Redundancy

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Cite this

Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system. / Zhu, Yifeng; Jiang, Hong; Qin, Xiao; Feng, Dan; Swanson, David R.

In: Cluster Computing, Vol. 9, No. 4, 01.10.2006, p. 433-447.

Research output: Contribution to journalArticle

Zhu, Yifeng ; Jiang, Hong ; Qin, Xiao ; Feng, Dan ; Swanson, David R. / Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system. In: Cluster Computing. 2006 ; Vol. 9, No. 4. pp. 433-447.
@article{51026a0bdf954f0cb2ec2f7309d6d5a5,
title = "Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system",
abstract = "While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less load among each mirroring pair. The duplication of modified data to the mirroring nodes is performed asynchronously in the background. The read performance is improved by two techniques: doubling the degree of parallelism and hot-spot skipping. A synthetic benchmark is used to evaluate these algorithms in a real cluster environment and the proposed algorithms are shown to be very effective in performance enhancement.",
keywords = "CEFT, Cluster computing, Clusterfile systems, Data storage, PVFS, RAID, Redundancy",
author = "Yifeng Zhu and Hong Jiang and Xiao Qin and Dan Feng and Swanson, {David R.}",
year = "2006",
month = "10",
day = "1",
doi = "10.1007/s10586-006-0011-6",
language = "English (US)",
volume = "9",
pages = "433--447",
journal = "Cluster Computing",
issn = "1386-7857",
publisher = "Kluwer Academic Publishers",
number = "4",

}

TY - JOUR

T1 - Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system

AU - Zhu, Yifeng

AU - Jiang, Hong

AU - Qin, Xiao

AU - Feng, Dan

AU - Swanson, David R.

PY - 2006/10/1

Y1 - 2006/10/1

N2 - While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less load among each mirroring pair. The duplication of modified data to the mirroring nodes is performed asynchronously in the background. The read performance is improved by two techniques: doubling the degree of parallelism and hot-spot skipping. A synthetic benchmark is used to evaluate these algorithms in a real cluster environment and the proposed algorithms are shown to be very effective in performance enhancement.

AB - While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less load among each mirroring pair. The duplication of modified data to the mirroring nodes is performed asynchronously in the background. The read performance is improved by two techniques: doubling the degree of parallelism and hot-spot skipping. A synthetic benchmark is used to evaluate these algorithms in a real cluster environment and the proposed algorithms are shown to be very effective in performance enhancement.

KW - CEFT

KW - Cluster computing

KW - Clusterfile systems

KW - Data storage

KW - PVFS

KW - RAID

KW - Redundancy

UR - http://www.scopus.com/inward/record.url?scp=33750118626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750118626&partnerID=8YFLogxK

U2 - 10.1007/s10586-006-0011-6

DO - 10.1007/s10586-006-0011-6

M3 - Article

AN - SCOPUS:33750118626

VL - 9

SP - 433

EP - 447

JO - Cluster Computing

JF - Cluster Computing

SN - 1386-7857

IS - 4

ER -