Proceedings 2001 Pacific Rim International Symposium on Dependable Computing
Download PDF

Abstract

In this paper, we study the availability of clustered computing system with one cluster manager and "N+M" processing nodes, where M processing nodes serve as spares for the N active processing nodes. The functionality of individual processing node is dissected into application software, management software, OS and hardware. The dependency among these entities is considered. Stochastic Petri net models are constructed to investigate the cluster availability. In order to deal with the cluster with very large size, a solution based on state aggregation and fixed-point iteration is proposed. The existence and uniqueness of the fixed point is proved. The impact of cluster manager, switchover time and coverage ratio are quantitatively studied. From the numerical results of a simple cluster with "2+1" processing nodes, we find that: (1) the availability of cluster manager does not have significant impact to the system availability, (2) system availability increases with the coverage ratio and decreases with the switchover time. The mechanisms to improve the system availability are discussed.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!