Proceedings of the Third IEEE/ACM International Symposium on Cluster Computing and the Grid
Download PDF

Abstract

This paper describes issues in the design and implementation of checkpointing and recovery modules for the Kerrighed DSM cluster system. Our design is for a DSM supporting the sequential consistency model. The mechanisms are general enough to be used in a number of different checkpointing and recovery protocols. It is designed to support common optimizations for performance suggested in literature, while staying light-weight during fault-free execution. We also present preliminary performance results of the current implementation.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles