Simulation Symposium, Annual
Download PDF

Abstract

DERT (Distributed Error Recovery Testbed) is a testbed for simulation and performance evaluation of several classes of application-transparent distributed error recovery schemes. DERT is built on top of an event-driven, message-passing, object-oriented, multithreaded simulation kernel. Actual compiled distributed applications are instrumented for data collection and executed on the simulated multicomputer. Checkpointing is implemented in full detail, including associated overhead per message, additional messages, and changes to the memory system. DERT allows easy modification of a wide variety of system parameters, thus offering a level of flexibility not easily achieved by experimentation on a particular real machine. This paper describes the design, functionality, and performance of DERT. The main problems encountered in DERT's development are discussed, as well as examples of its use in evaluating recovery schemes.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!