Efficient message logging for uncoordinated checkpointing protocols | SpringerLink
Skip to main content

Efficient message logging for uncoordinated checkpointing protocols

  • Session 8 Replication and Distribution
  • Conference paper
  • First Online:
Dependable Computing — EDCC-2 (EDCC 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1150))

Included in the following conference series:

Abstract

A message is in-transit with respect to a global state if its sending is recorded in this global state, while its receipt is not. Checkpointing algorithms have to log such in-transit messages in order to restore the state of channels when a computation has to be resumed from a consistent global state after a failure has occurred. Coordinated checkpointing algorithms log those in-transit messages exactly on stable storage. Because of their lack of synchronization, uncoordinated checkpointing algorithms conservatively log more messages.

This paper presents an uncoordinated checkpointing protocol that logs all in-transit messages and the smallest possible number of non in-transit messages. As a consequence, the protocol saves stable storage space and enables quicker recoveries. An appropriate tracking of message causal dependencies constitutes the core of the protocol.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. A. Acharya, B.R. Badrinath, Checkpointing Distributed Applications on Mobile Computers, Proc. 3rd Int. Conf. on Par. and Dist. Information Systems, 1994.

    Google Scholar 

  2. L. Alvisi, K. Marzullo, Message Logging: Pessimistic, Optimistic, and Causal, Proc. 15th IEEE Int. Conf. on Distributed Computing Systems, 1995, pp. 229–236.

    Google Scholar 

  3. R. Baldoni, J. M. Hélary, A. Mostefaoui, M. Raynal, Consistent Checkpointing in Distributed systems, INRIA Research Report 2564, June 1995, 25 p.

    Google Scholar 

  4. R. Baldoni, J. Brzezinski, J.M. Hélary, A. Mostefaoui, M. Raynal, Characterization of Consistent Checkpoints in Large Scale Distributed Systems. Proc. 6th IEEE Int. Workshop on Future Trends of Dist. Comp. Sys., Korea, pp. 314–323, August 1995.

    Google Scholar 

  5. K.M. Chandy, L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems, ACM Trans. on Comp. Sys., Vol. 3(1), 1985, pp. 63–75.

    Google Scholar 

  6. E.N. Elnozahy, W. Zwaenepoel, Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback and Fast Output Commit, IEEE Trans. on Computers, Vol. 41(5), 1992, pp. 526–531.

    Google Scholar 

  7. D.B. Johnson, W. Zwaenepoel, Sender-Based Message Logging, Proc. 17th IEEE Conf. on Fault-Tolerant Computing Systems, 1987, pp. 14–19.

    Google Scholar 

  8. D.B. Johnson, W. Zwaenepoel, Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing, Journal of Algorithms, Vol. 11(3), 1990, pp. 462–491.

    Google Scholar 

  9. R. Koo, S. Toueg, Checkpointing and Rollback-Recovery for Distributed Systems, IEEE Trans. on Software Engineering, Vol. 13(1), 1987, pp. 23–31.

    Google Scholar 

  10. L. Lamport, Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM, Vol. 21(7), 1978, pp. 558–565.

    Google Scholar 

  11. F. Mattern, Virtual Time and Global States of Distributed Systems. In Cosnard, Quinton, Raynal, and Robert, Editors, Proc. Int. Workshop on Dist. Alg., France, October 1988, pp. 215–226, 1989.

    Google Scholar 

  12. R.H.B. Netzer, J. Xu, Necessary and Sufficient Conditions for Consistent Global Snapshots, IEEE Trans. on Parallel and Distributed Systems, Vol. 6(2), 1995, pp. 165–169.

    Google Scholar 

  13. B. Randell, System Structure for Software Fault-Tolerance, IEEE Trans. on Software Engineering, Vol. 1(2), 1975, pp. 220–232.

    Google Scholar 

  14. M. Raynal, A. Schiper, S. Toueg, The Causal Ordering Abstraction and a Simple Way to Implement it, Inf. Processing Letters, Vol. 39, 1991, pp. 343–350.

    Google Scholar 

  15. F. Ruget, Cheaper Matrix Clocks, Proc. 8th Int. Workshop on Distributed Algorithms, Springer Verlag, LNCS 857, pp. 340–354, 1994.

    Google Scholar 

  16. D.L. Russell, State Restoration in Systems of Communicating Processes, IEEE Trans. on Software Engineering, Vol. 6, 1980, pp. 183–194.

    Google Scholar 

  17. L.M. Silva, J.G. Silva, Global Checkpointing for Distributed Programs, Proc. 11th IEEE Symp. on Reliable Distributed Systems, Houston, TX, 1992, pp. 155–162.

    Google Scholar 

  18. M. Singhal, F. Mattern, An Optimality Proof for Asynchronous Recovery Algorithms in Distributed Systems, Inf. Processing Letters, Vol. 55, 1995, pp. 117–121.

    Google Scholar 

  19. R.E. Strom, S. Yemini, Optimistic Recovery in Distributed Systems, ACM Transactions on Computer Systems, Vol. 3(3), 1985, pp. 204–226.

    Google Scholar 

  20. G.T. Wuu, A. J. Bernstein, Efficient Solutions to the Replicated Log and Dictionary Problems, Proc. 3rd ACM Symp. on Principles of Dist. Comp., 1984, pp. 233–242.

    Google Scholar 

  21. Y.M. Wang, W.K. Fuchs, Optimistic Message Logging for Independent Checkpointing in Message-Passing Systems, Proc. 11th IEEE Symp. Reliable Distributed Systems, 1992, pp. 147–154.

    Google Scholar 

  22. Y.M. Wang, P.Y. Chung, I.J. Lin, W.K. Fuchs, Checkpointing Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems, IEEE Trans. on Parallel and Distributed Systems, Vol. 6(5), 1995, pp. 546–554.

    Google Scholar 

  23. J. Xu, R.H.B. Netzer, M. Mackey, Sender-Based Message Logging for Reducing Rollback Propagation, Proc. 7th IEEE Symp. on Parallel and Distributed Processing, 1995, pp. 602–609, San Antonio.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrzej Hlawiczka João Gabriel Silva Luca Simoncini

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mostefaoui, A., Raynal, M. (1996). Efficient message logging for uncoordinated checkpointing protocols. In: Hlawiczka, A., Silva, J.G., Simoncini, L. (eds) Dependable Computing — EDCC-2. EDCC 1996. Lecture Notes in Computer Science, vol 1150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61772-8_48

Download citation

  • DOI: https://doi.org/10.1007/3-540-61772-8_48

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61772-3

  • Online ISBN: 978-3-540-70677-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics