Performance of communication-induced checkpointing algorithms

D. Manivannan, Chi Zhang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this paper, we analyze the performance of four communication-induced checkpointing algorithms. Our study shows that even though the performance of some of the communication-induced checkpointing algorithms are a suspect, in terms of scalability and checkpointing overhead, some have same performance as coordinated checkpointing algorithms but without any explicit synchronization overhead. Traditionally, pessimistic and optimistic message logging techniques are used to handle the various types of messages that arise during rollback recovery. Under these two message logging techniques, all the messages sent by all processes are logged either by the sender or receiver. However, our study shows that selective message logging together with a carefully designed communication-induced checkpointing algorithm can give good performance in terms of checkpointing overhead and message logging overhead.

Original languageEnglish (US)
Pages (from-to)129-136
Number of pages8
JournalComputer Systems Science and Engineering
Volume18
Issue number3
StatePublished - May 1 2003

Fingerprint

Checkpointing
Communication
Rollback Recovery
Scalability
Synchronization
Recovery
Receiver

Keywords

  • Communication-induced checkpointing
  • Consistent global snapshot
  • Distributed checkpointing
  • Fault-tolerance
  • Performance evaluation
  • Quasi-synchronous checkpointing

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Performance of communication-induced checkpointing algorithms. / Manivannan, D.; Zhang, Chi.

In: Computer Systems Science and Engineering, Vol. 18, No. 3, 01.05.2003, p. 129-136.

Research output: Contribution to journalArticle

@article{2556f7fce3a44d0180b2e94ebb1ef2ac,
title = "Performance of communication-induced checkpointing algorithms",
abstract = "In this paper, we analyze the performance of four communication-induced checkpointing algorithms. Our study shows that even though the performance of some of the communication-induced checkpointing algorithms are a suspect, in terms of scalability and checkpointing overhead, some have same performance as coordinated checkpointing algorithms but without any explicit synchronization overhead. Traditionally, pessimistic and optimistic message logging techniques are used to handle the various types of messages that arise during rollback recovery. Under these two message logging techniques, all the messages sent by all processes are logged either by the sender or receiver. However, our study shows that selective message logging together with a carefully designed communication-induced checkpointing algorithm can give good performance in terms of checkpointing overhead and message logging overhead.",
keywords = "Communication-induced checkpointing, Consistent global snapshot, Distributed checkpointing, Fault-tolerance, Performance evaluation, Quasi-synchronous checkpointing",
author = "D. Manivannan and Chi Zhang",
year = "2003",
month = "5",
day = "1",
language = "English (US)",
volume = "18",
pages = "129--136",
journal = "Computer Systems Science and Engineering",
issn = "0267-6192",
publisher = "CRL Publishing",
number = "3",

}

TY - JOUR

T1 - Performance of communication-induced checkpointing algorithms

AU - Manivannan, D.

AU - Zhang, Chi

PY - 2003/5/1

Y1 - 2003/5/1

N2 - In this paper, we analyze the performance of four communication-induced checkpointing algorithms. Our study shows that even though the performance of some of the communication-induced checkpointing algorithms are a suspect, in terms of scalability and checkpointing overhead, some have same performance as coordinated checkpointing algorithms but without any explicit synchronization overhead. Traditionally, pessimistic and optimistic message logging techniques are used to handle the various types of messages that arise during rollback recovery. Under these two message logging techniques, all the messages sent by all processes are logged either by the sender or receiver. However, our study shows that selective message logging together with a carefully designed communication-induced checkpointing algorithm can give good performance in terms of checkpointing overhead and message logging overhead.

AB - In this paper, we analyze the performance of four communication-induced checkpointing algorithms. Our study shows that even though the performance of some of the communication-induced checkpointing algorithms are a suspect, in terms of scalability and checkpointing overhead, some have same performance as coordinated checkpointing algorithms but without any explicit synchronization overhead. Traditionally, pessimistic and optimistic message logging techniques are used to handle the various types of messages that arise during rollback recovery. Under these two message logging techniques, all the messages sent by all processes are logged either by the sender or receiver. However, our study shows that selective message logging together with a carefully designed communication-induced checkpointing algorithm can give good performance in terms of checkpointing overhead and message logging overhead.

KW - Communication-induced checkpointing

KW - Consistent global snapshot

KW - Distributed checkpointing

KW - Fault-tolerance

KW - Performance evaluation

KW - Quasi-synchronous checkpointing

UR - http://www.scopus.com/inward/record.url?scp=0141518509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0141518509&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0141518509

VL - 18

SP - 129

EP - 136

JO - Computer Systems Science and Engineering

JF - Computer Systems Science and Engineering

SN - 0267-6192

IS - 3

ER -