Abstract
Software rejuvenation is a preventive and proactive fault management technique that is particularly useful for counteracting the phenomenon of software aging, aimed at cleaning up the system internal state to prevent the occurrence of future failure. The increasing interest in combing software rejuvenation with cluster systems has given rise to a prolific research activity in recent years. However, so far there have been few reports on the dependency between nodes in cluster systems when software rejuvenation is applied. This paper investigates the software rejuvenation policy for cluster computing systems with dependency between nodes, and reconstructs an stochastic reward net model of the software rejuvenation in such cluster systems. Simulation experiments and results reveal that the software rejuvenation strategy can decrease the failure rate and increase the availability of the cluster system. It also shows that the dependency between nodes affects software rejuvenation policy. Based on the theoretic analysis of the software rejuvenation model, a prototype is implemented on the Smart Platform cluster computing system. Performance measurement is carried out on this prototype, and experimental results reveal that software rejuvenation can effectively prevent systems from entering into disabled states, and thereby improving the ability of software fault-tolerance and the availability of cluster computing systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Parnas D (1994) Software aging. In: Proceedings of the 16th international conference on software engineering, pp 279–287
Huang Y, Kintala C, Kolettis N, Fulton N (1995) Software rejuvenation: analysis, module and applications. In: Proceedings of 25th symposium on fault tolerant, computing, pp 381–390
Grottke M, Li L, Vaidyanathan K, Trivedi K (2006) Analysis of software aging in a web server. IEEE Trans Reliab 55(3):411–420
Matias R, Freitas P, (2006) An emperimental study on software aging and rejuevenation in web servers. In: Proceedings of 30th annual international conference on computer software and applications, vol 1, pp 189–196
Grottke M, Nikora A, Trivedi K (2010) An empirical investigation of fault types in space mission system software. In: Proceedings of IEEE conference on dependable systems and networks, pp 447–456
Moorsel A, Wolter K (2006) Analysis of restart mechanisms in software systems. IEEE Trans Softw Eng 32(8):547–558
Alonso J, Torres J, Berral J, Gavalda R (2010) Adaptive on-line software aging prediction based on machine learning. In: Proceedings of international conference on dependable systems and networks, pp 507–516
Dugan J, Trivedi K (1989) Coverage modeling for dependability analysis of fault-tolerant systems. IEEE Trans Comput 38(6):775–787
Gokhale S, Trivedi K (1998) Dependency characterization in path-based approaches to architecture-based software reliability prediction. In: Proceedings of international conference on application-specific software engineering technology, pp 86–89
Popstojanova K, Trivdei K (2000) Failure correlation in software reliability models. IEEE Trans Reliab 49(1):37–48
Fan X, Xu G, Ying R, Zhang H, Jiang L (2003) Performance analysis of software rejuvenation on dispatcher–worker based cluster system. In: Proceedings of the 4th international conference on parallel and distributed computing, applications and technologies, pp 562–566
Vaidyanathan K, Haarper R, Hunter S, Trivedi K (2001) Analysis and implementation of software rejuvenation in cluster systems. In: Proceedings of joint international conference on measurement and modeling of computer systems, ACM SIGMETRICS, pp 62–71
Bobbio A, Sereno A, Anglano C (2001) Fine grained software degradation models for optimal rejuvenation policies. J Perform Eval 46:45–62
Dohi T, Popstojanova K, Trivedi K (2000) Statistical nonparametric algorithms to estimate the optimal software rejuvenation schedule. In: Proceedings of Pacific rim international symposium dependendable computing, pp 77–84
Grag S, Puliafito A, Telek M, Trivedi K (1998) Analysis of preventive maintenance in transactions based software systems. IEEE Trans Comput 47(1):96–107
Bao Y, Sun X, Trivedi K (2005) A workload-based analysis of software aging and rejuvenation. IEEE Trans Reliab 55(3):541–548
Koutras V, Platis A, Gravvanis G (2009) Optimal server resource reservation policies for priority classes of users under cyclic non-homogeneous markov modeling. Eur J Oper Res 198(2):545–556
Garg S, Moorsel A, Vaidyanathan K, Trivedi K (1998) A methodology for detection and estimation of software aging. In: Proceedings of 9th international symposium on software, reliability engineering, pp 282–292
Vaidyanathan K, Trivedi S (1999) A measurement-based model for estimation of resource exhaustion in operation systems. In: Proceedings of 10th international symposium on software, reliability engineering, pp 84–93
Vaidyanathan K, Trivedi S (2005) A comprehensive model for software rejuvenation. IEEE Trans Dependable Secur Comput 2(2):124–137
Cassidy K, Gross K, Malekpout A (2002) Advanced pattern recognition for detection of complex software aging in online transaxtion processing servers. In: Proceedings of dependable systems and networks, pp 478–482
Gross K, Bhardwaj V, Bickford R (2002) Proactive detective of software aging mechanisms in performance critical computers. In: Proceedings of 27th IEEE annual symposium on software enginerring, pp 17–23
Silva L, Alonso J, Torres J (2009) Using virtualization to improve software rejuvenation. IEEE Trans Comput 58(11):1525–1538
Matias R, Barbetta P, Trivedi K, Freitas P (2010) Accelerated degradation tests applied to software aging experiments. IEEE Trans Reliab 59(1):102–114
Avritzer A, Weyuker E (1997) Monitoring smoothly degrading systems for increased dependability. Empir Softw Eng J 2(1):59–77
Liu Y, Trivedi K, Ma Y, Han J, Levendel H (2002) Modeling and analysis of software rejuvenation in cable modem termination systems. In: Proceedings of 13th international symposium on software, reliability engineering, pp 159–170
Tai A, Chau S, Alkalaj L, Hecht H (1997) On-board preventive maintenance: analysis of effectiveness and optimal duty period. In: Proceedings of 3rd international workshop on object oriented real-time dependable systems, pp 40–47
Kourai K, Chiba S (2011) Fast software rejuvenation of virtual machine monitors. IEEE Trans Dependable Secur Comput 8(6):839–851
Wang D, Xie W, Trivedi K (2007) Performance analysis of clustered systems with rejuvenation under varying workload. J Perform Eval 64:247–265
Xie W, Shi Y, Xu G, Mao Y (2002) Smart Platform—a software infrastructure for smart space. In: Proceedings of 4th IEEE conference on multimodal, interfaces, pp 429–435
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by National Natural Science Foundation of China under the grant No. 60872044, 71133006, and Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China .
Rights and permissions
About this article
Cite this article
Yang, M., Min, G., Yang, W. et al. Software rejuvenation in cluster computing systems with dependency between nodes. Computing 96, 503–526 (2014). https://doi.org/10.1007/s00607-014-0385-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-014-0385-x