Proceedings 2001 Pacific Rim International Symposium on Dependable Computing
Download PDF

Abstract

With the great progress of distributed object computing, more and more large systems are built using this technology. Thus fault tolerance for distributed object computing is obviously a significant research domain. The Object Management Group (OMG) had recently published the "Fault Tolerant CORBA Specification V1.0". This specification defines how to achieve fault tolerance for distributed object computing using object group, and failure detection is one of the key elements for fault management. But the specification does not depict much about failure detection and leaves many specific details to venders. In this paper, we propose a simple mechanism for failure detection in distributed object computing. This mechanism is designed to be general rather than application-specific, with no single point of failure, and efficient. While the failure detectors may also crash during operating, we propose a method to handle this condition and to ensure the "no single point of failure" feature. The proposed mechanism has been implemented using CORBA to demonstrate that it works well.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!