Abstract
Rollback-Dependency Trackability (RDT) is a property that states that all rollback dependencies between local checkpoints are on-line trackable by using a transitive dependency vector. In this paper, we address three fundamental issues in the design of communication-induced checkpointing protocols that ensure RDT. First, we prove that the following intuition commonly assumed in the literature is in fact false: If a protocol forces a checkpoint only at a stronger condition, then it must take, at most, as many forced checkpoints as a protocol based on a weaker condition. This result implies that the common approach of sharpening the checkpoint-inducing condition by piggybacking more control information on each message may not always yield a more efficient protocol. Next, we prove that there is no optimal on-line RDT protocol that takes fewer forced checkpoints than any other RDT protocol for all possible communication patterns. Finally, since comparing checkpoint-inducing conditions is not sufficient for comparing protocol performance, we present some formal techniques for comparing the performance of several existing RDT protocols.
Original language | English |
---|---|
Pages (from-to) | 963-971 |
Number of pages | 9 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 9 |
Issue number | 10 |
DOIs | |
State | Published - 1998 |
Externally published | Yes |
Keywords
- Checkpointing
- Communication-induced protocols
- On-line algorithms
- Rollback recovery
- Rollback-dependency trackability