Abstract
This paper investigates the problem of rollback recovery in distributed shared memory (DSM) systems. We propose a new log-based recovery approach, which can tolerate multiple node failures. The recovery approach employs an independent checkpointing technique and a new logging scheme. The independent checkpointing technique periodically interrupts the execution of a node to save the node's state. The new logging scheme takes advantage of the DSM's unique properties to reduce the logging overhead. Based on the proposed recovery approach, the pre-failure state of a faulty node can be deterministically created without involving any fault-free node. In addition, some consistency information may be lost after a node becomes faulty. To reconstruct the lost consistency information, we also present an efficient consistency reconstruction method in this paper. Finally, extensive trace-driven simulations are performed to show the effectiveness of the new logging scheme.
| Original language | English |
|---|---|
| Pages (from-to) | 271-290 |
| Number of pages | 20 |
| Journal | Journal of Information Science and Engineering |
| Volume | 16 |
| Issue number | 2 |
| State | Published - 03 2000 |
| Externally published | Yes |