Abstract
Predictable performance in the event of failuresis of paramount importance in most safety critical real-timesystems. Various hardware as well as software fault-toleranttechniques are employed towards this goal among which checkpointingis a relatively cost-effective scheme. Since checkpointing schemesdepend on time redundancy, they could affect the correctnessof the system by causing deadlines to be missed. This paper providesexact schedulability tests for fault tolerant task sets undera specified failure hypothesis and employing checkpointing toassist in fault recovery. The effects of checkpointing strategieson task response time are analysed and some insights for optimalcheckpointing are provided. The emphasis here is on utilizingthis analysis as an off-line design support tool.
Similar content being viewed by others
References
Audsley, N. C., Burns, A., Richardson, M. F., Tindell, K., and Wellings, A. 1993. Applying new scheduling theory to static priority pre-emptive scheduling. Software Engineering Journal 8(5): 284-292.
Audsley, N. C., Burns, A., Richardson, M. F., and Wellings, A. 1991. Hard real-time scheduling: The deadline monotonic approach. IFAC/IFIP Workshop. Atlanta, Georgia, pp. 127-132.
Bannister, J. A., and Trivedi, K. S. 1983. Task allocation in fault-tolerant distributed systems. Acta Informatica 20: 261-281.
Bowen, N. S., and Pradhan, D. K. 1993. Processor and memory based checkpoint and rollback recovery. IEEE Computer 26(2): 22-29.
Burns, A. 1994. Preemptive priority based scheduling: An appropriate engineering approach. In: Advances in Real-Time Systems, Son, S., ed. Prentice Hall, pp. 225-248.
Burns, A., Davis, R. I., and Punnekkat, S. 1996. Feasibility analysis of fault-tolerant real-time task sets. Euromicro Real-Time Systems Workshop, pp. 29-33.
Campbell, A., McDonald, P., and Ray, K. 1992. Single event upset rates in space. IEEE Transactions on Nuclear Science 39(6): 1828-1835.
Castillo, X., McConnel, S., and Siewiorek, D. 1982. Derivation and calibration of a transient error reliability model. IEEE Transactions on Computers 31(7): 658-671.
Chandy, K. M., Browne, J. C., Dissly, C. W., and Uhrig, W. R. 1975. Analytic models for rollback and recovery strategies in data base systems. IEEE Transactions on Software Engineering SE-1(1): 100-110.
Gelenbe, E. 1979. On the optimum checkpoint interval. Journal of the ACM 26(2): 259-270.
Ghosh, S., Melhem, R., and Mosse, D. 1995. Enhancing real-time schedules to tolerate transient faults. Proceedings Real-Time Systems Symposium.
Ghosh, S., Melhem, R., Mosse, D., and Sansarma, J. 1998. Fault tolerant rate monotonic scheduling. Real-Time Systems Journal 15:(2).
Joseph, M., and Pandya, P. 1986. Finding response times in a real-time system. The Computer Journal-British Computer Society 29(5): 390-395.
Katcher, D., Arakawa, H., and Strosnider, J. 1993. Engineering and analysis of fixed priority schedulers. IEEE Transactions on Software Engineering 19(9): 920-934.
Krishna, C., and Shin, K. 1986. On scheduling tasks with a quick recovery from failure. IEEE Transactions on Computers 35(5): 448-455.
Lehoczky, J., Sha, L., and Ding, Y. 1989. The rate monotonic scheduling algorithm-Exact characterization and average case behaviour. Proceedings of IEEE Real-Time Systems Symposium, pp. 166,171.
Liestman, A. L., and Campbell, R. H. 1986. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering 12(11): 1089-1095.
Liu, C. L., and Layland, J.W. 1973. Scheduling algorithms for multiprogramming in a hard real-time environment. Journal of the ACM 20(1): 40-61.
Miremadi, G., and Torin, J. 1995. Evaluating processor-behaviour and three error-detection mechanisms using physical fault-injection. IEEE Transactions on Reliability 44(3): 441-453.
Nicola, V. F. 1995. Checkpointing and the modeling of program execution time. In Software Fault Tolerance, Lyu, M., ed. John Wiley & Sons Ltd, pp. 167-188.
Oh, Y., and Son, S. H. 1994. Enhancing fault-tolerance in rate-monotonic scheduling. Real-Time Systems 7(3): 315-329.
Pandya, M., and Malek, M. 1998. Minimum achievable utilization for fault-tolerant processing of periodic tasks. IEEE Transactions on Computers 47(10): 1102-1112.
Punnekkat, S., Davis, R., and Burns, A. 1997. Sensitivity analysis of real-time task sets. Asian Computing Science Conference, Kathmandu, Nepal, LNCS-1345, Springer-Verlag, pp. 72-82.
Ramos-Thuel, S., and Strosnider, J. 1994. Scheduling fault recovery operations for time-critical applications. In Proceedings of fourth IFIP working conference on Dependable Computing for Critical Applications.
Randell, B. 1975. System structure for software fault tolerance. IEEE Transactions on Software Engineering SE-1(2): 220-232.
Serlin, O. 1972. Scheduling of time critical processes. Proceedings AFIPS Spring Computing Conference, pp. 925-932.
Sha, L., Rajkumar, R., and Lehoczky, J. 1990. Priority inheritance protocols: An approach to real-time synchronization. IEEE Transactions on Computers 39(9): 1175-1185.
Shin, K. G., Lin, T., and Lee, Y. 1987. Optimal checkpointing of real-time tasks. IEEE Transactions on Computers C-36(11): 1328-1341.
Tantawi, A. N., and Ruschitzka, M. 1984. Performance qnalysis of checkpointing wtrategies. ACM Transactions on Computer Systems 2(2): 123-143.
Tindell, K. W., Burns, A., and Wellings, A. 1994. An extendible approach for analysing fixed priority hard real-time tasks. Journal of Real-Time Systems 6(2): 133-151.
Tsuchiya, T., Kakuda, Y., and Kikuno, T. 1995. Fault-tolerant scheduling algorithm for distributed real-time systems. In Procceding of 3rd Workshop on Parallel & Distributed Real-time Systems, pp. 99-103.
Vaidya, N. H. 1997. Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Transactions on Computers 46(8): 942-947.
Young, J.W. 1974. A first order approximation to the optimum checkpoint interval. Communications of the ACM 17(9): 530-531.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Punnekkat, S., Burns, A. & Davis, R. Analysis of Checkpointing for Real-Time Systems. Real-Time Systems 20, 83–102 (2001). https://doi.org/10.1023/A:1026589200419
Issue Date:
DOI: https://doi.org/10.1023/A:1026589200419