Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets

Lima, Fábio; Macêdo, Raimundo

doi:10.1007/11572329_16

Fábio Lima¹⁹ &
Raimundo Macêdo¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3747))

Included in the following conference series:

Latin-American Symposium on Dependable Computing

333 Accesses
3 Citations

Abstract

A failure detector is an important building block for fault-tolerant distributed computing: mechanisms such as distributed consensus and group communication rely on the information provided by failure detectors in order to make progress and terminate. As such, erroneous information provided by the failure detector (or the absence of it) may delay decision-making or lead the upper-layer fault-tolerant mechanism to take incorrect decisions (e.g., the exclution of a correct process from a group membership). On the other hand, the implementation of failure detectors that can precisely identify failures is restricted by the actual behaviour of a system, especially in settings where message transmission delays and system loads can vary over time. In this paper we explore the use of artificial neural networks in order to implement failure detectors that are dynamically adapted to the current communication load conditions. The training patterns used to feed the neural network were obtained by using Simple Network Management Protocol (SNMP) agents over MIB – Management Information Base variables. The output of such neural network is an estimation for the arrival time for the failure detector to receive the next heartbeat message from a remote process. The suggested approach was fully implemented and tested over a set of GNU/Linux networked workstations. In order to analyze the efficiency of our approach, we have run a series of experiments where network loads were varied randomly, and we measured several QoS parameters, comparing our detector against known implementations. The performance data collected indicate that neural networks and MIB variables can indeed be combined to improve the QoS of failure detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An automated fault detection system for communication networks and distributed systems

Article 08 January 2021

Intelligent failure localization and maintenance of network based on reliability

Article 11 July 2022

Integration of PBFT and Raft Algorithms with Recurrent Neural Networks to Improve the Reliability of Distributed Systems

References

Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43, 225–267 (1996)
Article MATH MathSciNet Google Scholar
Hurfin, M., Macêdo, R., Raynal, M., Tronel, F.: A general framework to solve agreement problems. In: SRDS 1999: Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, Washington, DC, USA, pp. 56–65. IEEE Computer Society, Los Alamitos (1999)
Chapter Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32, 374–382 (1985)
Article MATH MathSciNet Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35, 288–323 (1988)
Article MathSciNet Google Scholar
Verissimo, P., Casimiro, A., Fetzer, C.: The timely computing base: Timely actions in the presence of uncertain timeliness. In: DSN 2000: Proceedings of the 2000 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. 533–542. IEEE Computer Society Press, Los Alamitos (2000)
Chapter Google Scholar
Gorender, S., Macêdo, R., Raynal, M.: A hubrid and adaptive model for fault-tolerant distribuded computing. In: DSN 2005: Proceedings of the 2005 International Conference on Dependable Systems and Networks, Yokohama, Japan, pp. 412–421. IEEE Computer Society Press, Los Alamitos (2005)
Chapter Google Scholar
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. Journal of the ACM 43, 685–722 (1996)
Article MATH MathSciNet Google Scholar
Macêdo, R.: Failure detection in asynchronous distributed systems. In: II WTF: Workshop on Tests and Fault-Tolerance, Curitiba, PR, Brazil, pp. 76–81 (2000)
Google Scholar
Nunes, R., Jansch-Porto, I.: A lightweight interface to predict communication delays using time series. In: de Lemos, R., Weber, T.S., Camargo Jr., J.B. (eds.) LADC 2003, vol. 2847, pp. 254–263. Springer, Heidelberg (2003)
Chapter Google Scholar
Bertier, M., Marin, O., Sens, P.: Implementation and performance evaluation of an adaptable failure detector. In: DSN 2002: Proceedings of the 2002 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. 354–363. IEEE Computer Society Press, Los Alamitos (2002)
Chapter Google Scholar
Devianov, B., Toueg, S.: Failure detector service for dependable computing. In: DSN 2000: Proceedings of the 2000 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. B14–B15. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Larrea, M., Fernández, A., Arévalo, S.: Optimal implementation of the weakest failure detector for solving consensus (brief announcement). In: PODC 2000: Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, p. 334. ACM Press, New York (2000)
Chapter Google Scholar
Sotoma, I., Madeira, E.R.M.: Adaptation - algorithms to adaptive fault monitoring and their implementation on corba. In: DOA 2001: Proceedings of the Third International Symposium on Distributed Objects and Applications, Washington, DC, USA, pp. 219–228. IEEE Computer Society Press, Los Alamitos (2001)
Chapter Google Scholar
Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51, 13–32 (2002)
Article MathSciNet Google Scholar
Bertier, M., Marin, O., Sens, P.: Performance analysis of hierarchical failure detector. In: DSN 2003: Proceedings of the 2003 International Conference on Dependable Systems and Networks, San-Francisco, USA, pp. 635–644. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
Macêdo, R., Lima, F.: Improving the quality of service of failure detectors with snmp and artificial neural networks. In: Simpósio Brasileiro de Redes de Computadores, SBRC 2004, short-paper track, pp. 583–586. SBC - Brazilian Computer Society, Brazil (2004)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 1st edn. MacMillan Publishing Company, Basingstoke (1994)
MATH Google Scholar
McCloghrie, K., Rose, M.: Management Information Base for Network Management of TCP/IP-based internets: MIB-II. RFC 1213, Standard (1991) Updated by RFCs 2011, 2012 (2013)
Google Scholar
Case, J., Fedor, M., Schoffstall, M., Davin, J.: Simple Network Management Protocol (SNMP). RFC 1157, Historic (1990)
Google Scholar
Jacobson, V.: Congestion avoidance and control. In: SIGCOMM 1988: Symposium proceedings on Communications architectures and protocols, pp. 314–329. ACM Press, New York (1988)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Distributed Systems Laboratory – LaSiD, Computing Science Department, Federal University of Bahia, Campus de Ondina, CEP: 40170-110, Salvador, BA, Brazil
Fábio Lima & Raimundo Macêdo

Authors

Fábio Lima
View author publications
You can also search for this author in PubMed Google Scholar
Raimundo Macêdo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pontifícia Universidade Católica do Paraná (PUCPR), PPGIA, Paraná, Brazil
Carlos Alberto Maziero
Dept. Engenharia Informática/CISUC Universidade de Coimbra - Polo II, 3030-397, Coimbra, Portugal
João Gabriel Silva
Universidade Federal da Bahia (UFBA), Prédio do CPD, Campus de Ondina, Av. Ademar de Barros, CEP 40.170-110, Salvador, BA, Brazil
Aline Maria Santos Andrade & Flávio Morais de Assis Silva &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lima, F., Macêdo, R. (2005). Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets. In: Maziero, C.A., Gabriel Silva, J., Andrade, A.M.S., de Assis Silva, F.M. (eds) Dependable Computing. LADC 2005. Lecture Notes in Computer Science, vol 3747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11572329_16

Download citation

DOI: https://doi.org/10.1007/11572329_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29572-3
Online ISBN: 978-3-540-32092-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets

Abstract

Access this chapter

Preview

Similar content being viewed by others

An automated fault detection system for communication networks and distributed systems

Intelligent failure localization and maintenance of network based on reliability

Integration of PBFT and Raft Algorithms with Recurrent Neural Networks to Improve the Reliability of Distributed Systems

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets

Abstract

Access this chapter

Preview

Similar content being viewed by others

An automated fault detection system for communication networks and distributed systems

Intelligent failure localization and maintenance of network based on reliability

Integration of PBFT and Raft Algorithms with Recurrent Neural Networks to Improve the Reliability of Distributed Systems

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation