A case study of parallel I/O for biological sequence search on Linux clusters

Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance- optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.

Original languageEnglish (US)
Pages (from-to)214-222
Number of pages9
JournalInternational Journal of High Performance Computing and Networking
Volume1
Issue number4
DOIs
StatePublished - Jan 1 2004

Fingerprint

Servers
Degradation
Linux

Keywords

  • BLAST
  • CEFT-PVFS
  • PVFS
  • bioinformatics
  • cluster computing
  • parallel I/O
  • sequence comparison

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

A case study of parallel I/O for biological sequence search on Linux clusters. / Zhu, Yifeng; Jiang, Hong; Qin, Xiao; Swanson, David.

In: International Journal of High Performance Computing and Networking, Vol. 1, No. 4, 01.01.2004, p. 214-222.

Research output: Contribution to journalArticle

@article{29f77426328d4c4d9031989b4b33f7fe,
title = "A case study of parallel I/O for biological sequence search on Linux clusters",
abstract = "In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance- optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.",
keywords = "BLAST, CEFT-PVFS, PVFS, bioinformatics, cluster computing, parallel I/O, sequence comparison",
author = "Yifeng Zhu and Hong Jiang and Xiao Qin and David Swanson",
year = "2004",
month = "1",
day = "1",
doi = "10.1504/IJHPCN.2004.008350",
language = "English (US)",
volume = "1",
pages = "214--222",
journal = "International Journal of High Performance Computing and Networking",
issn = "1740-0562",
publisher = "Inderscience Enterprises Ltd",
number = "4",

}

TY - JOUR

T1 - A case study of parallel I/O for biological sequence search on Linux clusters

AU - Zhu, Yifeng

AU - Jiang, Hong

AU - Qin, Xiao

AU - Swanson, David

PY - 2004/1/1

Y1 - 2004/1/1

N2 - In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance- optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.

AB - In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance- optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.

KW - BLAST

KW - CEFT-PVFS

KW - PVFS

KW - bioinformatics

KW - cluster computing

KW - parallel I/O

KW - sequence comparison

UR - http://www.scopus.com/inward/record.url?scp=30644459480&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=30644459480&partnerID=8YFLogxK

U2 - 10.1504/IJHPCN.2004.008350

DO - 10.1504/IJHPCN.2004.008350

M3 - Article

AN - SCOPUS:30644459480

VL - 1

SP - 214

EP - 222

JO - International Journal of High Performance Computing and Networking

JF - International Journal of High Performance Computing and Networking

SN - 1740-0562

IS - 4

ER -