Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 11   p. 211a
A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations

Full Article Text: Download PDF of full textBuy this articleGet full text from IEEE Xplore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IPDPS.2004.1303242
Send link to a friend

Abstract
Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high probability of a node failure. We propose a hierarchical checkpointing protocol that combines a synchronized checkpointing technique inside clusters and a communication-induced technique between clusters. This protocol fits to the characteristics of a cluster federation (large number of nodes, high latency and low bandwidth networking technologies between clusters). A preliminary performance evaluation performed using a discrete event simulator shows that the protocol is suitable for code coupling applications.
Additional Information
Index Terms- Cluster Federation, Checkpointing and Recovery, Fault-tolerance, Parallel Application, Code Coupling

Citation:  Sebastien Monnet, Christine Morin, Ramamurthy Badrinath, "A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations," ipdps, p. 211a,  18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 11,  2004

Similar Articles

Abstract Contents
Abstract
Index Terms
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

PDFs require Adobe Acrobat Reader.

Peer Review Notice

Give us Feedback