Abstract
Availability is one of the most important requirements in production system. Keeping a persistent level of high availability in the Infrastructure-as-a-Service (IaaS) cloud computing is a challenge due to the complexity of service providing. By definition, the availability can be maintained by coupling with the fault tolerance approaches. Recently, many fault tolerance methods have been developed, but few of them adequately consider the fault detection aspect, which is critical to issue the appropriate recovery actions just in time. In this paper, based on a rigorous analysis on the nature of failures, we would like to introduce a method to early identify the faults occurring in the IaaS system. By engaging fuzzy logic algorithm and prediction technique, the proposed approach can provide better performance in terms of accuracy and reaction rate, which subsequently enhances the system reliability.
Similar content being viewed by others
References
Jhawar R, Piuri V, Santambrogio M (2013) Fault tolerance management in cloud computing: a system-level perspective. Syst J IEEE 7(2):288–297
Jhawar R, Piuri V, Santambrogio M (2012) A comprehensive conceptual system-level approach to fault tolerance in cloud computing. In: Systems conference (SysCon), IEEE international. IEEE, pp 1–5
Lu K, Yahyapour R, Wieder P, Yaqub E, Abdullah M, Schloer B, Kotsokalis C (2016) Fault-tolerant service level agreement lifecycle management in clouds using actor system. Future Gener Comput Syst 54:247–259
Deng J, Huang SC-H, Han YS, Deng JH (2010) Fault-tolerant and reliable computation in cloud computing. In: GLOBECOM workshops (GC Wkshps), 2010 IEEE, pp 1601–1605
Singh TK, RaviTeja GT, Pappala PS (2013) Fault tolerance-challenges, techniques and implementation in cloud computing. Int J Sci Res Publ 3(6):698–703
Amin Z, Sethi N, Singh H (2015) Review on fault tolerance techniques in cloud computing. Int J Comput Appl 116(8):11–17
Tamura Y, Yamada S (2016) Practical reliability and maintainability analysis tool for an open source cloud computing. Qual Reliab Eng Int 32(3):909–920
Reiser HP (xxxx) Byzantine fault tolerance for the cloud, University of Lisbon Faculty of Science, Portugal. http://cloudfit.di.fc.ul.pt
Kaushal V, Bala A (2011) Autonomic fault tolerance using haproxy in cloud environment. Int J Adv Eng Sci Technol 7(2):222–227
Malik S, Huet F (2011) Adaptive fault tolerance in real time cloud computing. In: Services (SERVICES) (2011) IEEE world congress on. IEEE 2011, pp 280–287
Chihoub H-E, Antoniu G, Pérez M (2011) Towards a scalable, fault-tolerant, self-adaptive storage for the clouds. In: EuroSys’ 11 doctoral workshop
Ballard G, Carson E, Knight N (2009) Algorithmic-based fault tolerance for matrix multiplication on amazon ec2 COMPSCI 262A class project. https://people.eecs.berkeley.edu/~knight/ballardcarsonknight_paper.pdf
Tamura Y, Yamada S (2015) Software reliability analysis considering the fault detection trends for big data on cloud computing. In: Industrial engineering, management science and applications. Springer, pp 1021–1030
Jiang Y, Huang J, Ding J, Liu Y (2014) Method of fault detection in cloud computing systems. Int J Grid Distrib Comput 7(3):205–212
What is open nebula? http://docs.opennebula.org/4.12/index.html. Accessed 13 Aug 2015
What is ganglia? http://ganglia.sourceforge.net/. Accessed 13 Aug 2015
What is ha proxy? http://www.haproxy.org/. Accessed 13 Aug 2015
Muller K, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201
Rasmussen C, Williams C (2005) Gaussian processes for machine learning, ser. adaptive computation and machine learning. MIT Press, (Online). http://www.gaussianprocess.org/gpml/chapters/
Chalupka K, Williams CKI, Murray I (2013) A framework for evaluating approximation methods for gaussian process regression. J Mach Learn Res 14:333–350
Bui D-M, Nguyen H-Q, Yoon Y, Jun S, Amin MB, Lee S (2015) Gaussian process for predicting cpu utilization and its application to energy efficiency. Appl Intell 43(4):874–891
Wu X (1999) Performance evaluation, prediction and visualization of parallel systems, ser. the international series on asian studies in computer and information science. Springer US, (Online). http://books.google.co.kr/books?id=IJZt5H6R8OIC
Feitelson DG (2003) Metric and workload effects on computer systems evaluation. Comput 36(9):18–25. doi:10.1109/MC.2003.1231190
Kounev S (2008) Software performance evaluation. Wiley Encyclopedia of Computer Science and Engineering, New York
Andras P, Idowu O, Periorellis P (2006) Fault tolerance and network integrity measures: the case of computer-based systems. In: Symposium on network analysis in natural sciences and engineering, p 3
Acknowledgements
This work is supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) NRF-2014R1A2A2A01003914 and the Industrial Core Technology Development Program (10049079, Development of mining core technology exploiting personal big data) funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea). This work is also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2011-0030079). Corresponding author is Prof. Sungyoung Lee.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no potential conflict of interest.
Rights and permissions
About this article
Cite this article
Bui, DM., Huynh-The, T. & Lee, S. Early fault detection in IaaS cloud computing based on fuzzy logic and prediction technique. J Supercomput 74, 5730–5745 (2018). https://doi.org/10.1007/s11227-017-2053-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2053-3