Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets | SpringerLink
Skip to main content

Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets

  • Conference paper
Dependable Computing (LADC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3747))

Included in the following conference series:

Abstract

A failure detector is an important building block for fault-tolerant distributed computing: mechanisms such as distributed consensus and group communication rely on the information provided by failure detectors in order to make progress and terminate. As such, erroneous information provided by the failure detector (or the absence of it) may delay decision-making or lead the upper-layer fault-tolerant mechanism to take incorrect decisions (e.g., the exclution of a correct process from a group membership). On the other hand, the implementation of failure detectors that can precisely identify failures is restricted by the actual behaviour of a system, especially in settings where message transmission delays and system loads can vary over time. In this paper we explore the use of artificial neural networks in order to implement failure detectors that are dynamically adapted to the current communication load conditions. The training patterns used to feed the neural network were obtained by using Simple Network Management Protocol (SNMP) agents over MIB – Management Information Base variables. The output of such neural network is an estimation for the arrival time for the failure detector to receive the next heartbeat message from a remote process. The suggested approach was fully implemented and tested over a set of GNU/Linux networked workstations. In order to analyze the efficiency of our approach, we have run a series of experiments where network loads were varied randomly, and we measured several QoS parameters, comparing our detector against known implementations. The performance data collected indicate that neural networks and MIB variables can indeed be combined to improve the QoS of failure detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43, 225–267 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  2. Hurfin, M., Macêdo, R., Raynal, M., Tronel, F.: A general framework to solve agreement problems. In: SRDS 1999: Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, Washington, DC, USA, pp. 56–65. IEEE Computer Society, Los Alamitos (1999)

    Chapter  Google Scholar 

  3. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32, 374–382 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  4. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35, 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  5. Verissimo, P., Casimiro, A., Fetzer, C.: The timely computing base: Timely actions in the presence of uncertain timeliness. In: DSN 2000: Proceedings of the 2000 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. 533–542. IEEE Computer Society Press, Los Alamitos (2000)

    Chapter  Google Scholar 

  6. Gorender, S., Macêdo, R., Raynal, M.: A hubrid and adaptive model for fault-tolerant distribuded computing. In: DSN 2005: Proceedings of the 2005 International Conference on Dependable Systems and Networks, Yokohama, Japan, pp. 412–421. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  7. Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. Journal of the ACM 43, 685–722 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  8. Macêdo, R.: Failure detection in asynchronous distributed systems. In: II WTF: Workshop on Tests and Fault-Tolerance, Curitiba, PR, Brazil, pp. 76–81 (2000)

    Google Scholar 

  9. Nunes, R., Jansch-Porto, I.: A lightweight interface to predict communication delays using time series. In: de Lemos, R., Weber, T.S., Camargo Jr., J.B. (eds.) LADC 2003, vol. 2847, pp. 254–263. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Bertier, M., Marin, O., Sens, P.: Implementation and performance evaluation of an adaptable failure detector. In: DSN 2002: Proceedings of the 2002 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. 354–363. IEEE Computer Society Press, Los Alamitos (2002)

    Chapter  Google Scholar 

  11. Devianov, B., Toueg, S.: Failure detector service for dependable computing. In: DSN 2000: Proceedings of the 2000 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. B14–B15. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  12. Larrea, M., Fernández, A., Arévalo, S.: Optimal implementation of the weakest failure detector for solving consensus (brief announcement). In: PODC 2000: Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, p. 334. ACM Press, New York (2000)

    Chapter  Google Scholar 

  13. Sotoma, I., Madeira, E.R.M.: Adaptation - algorithms to adaptive fault monitoring and their implementation on corba. In: DOA 2001: Proceedings of the Third International Symposium on Distributed Objects and Applications, Washington, DC, USA, pp. 219–228. IEEE Computer Society Press, Los Alamitos (2001)

    Chapter  Google Scholar 

  14. Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51, 13–32 (2002)

    Article  MathSciNet  Google Scholar 

  15. Bertier, M., Marin, O., Sens, P.: Performance analysis of hierarchical failure detector. In: DSN 2003: Proceedings of the 2003 International Conference on Dependable Systems and Networks, San-Francisco, USA, pp. 635–644. IEEE Computer Society Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  16. Macêdo, R., Lima, F.: Improving the quality of service of failure detectors with snmp and artificial neural networks. In: Simpósio Brasileiro de Redes de Computadores, SBRC 2004, short-paper track, pp. 583–586. SBC - Brazilian Computer Society, Brazil (2004)

    Google Scholar 

  17. Haykin, S.: Neural Networks: A Comprehensive Foundation, 1st edn. MacMillan Publishing Company, Basingstoke (1994)

    MATH  Google Scholar 

  18. McCloghrie, K., Rose, M.: Management Information Base for Network Management of TCP/IP-based internets: MIB-II. RFC 1213, Standard (1991) Updated by RFCs 2011, 2012 (2013)

    Google Scholar 

  19. Case, J., Fedor, M., Schoffstall, M., Davin, J.: Simple Network Management Protocol (SNMP). RFC 1157, Historic (1990)

    Google Scholar 

  20. Jacobson, V.: Congestion avoidance and control. In: SIGCOMM 1988: Symposium proceedings on Communications architectures and protocols, pp. 314–329. ACM Press, New York (1988)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lima, F., Macêdo, R. (2005). Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets. In: Maziero, C.A., Gabriel Silva, J., Andrade, A.M.S., de Assis Silva, F.M. (eds) Dependable Computing. LADC 2005. Lecture Notes in Computer Science, vol 3747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11572329_16

Download citation

  • DOI: https://doi.org/10.1007/11572329_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29572-3

  • Online ISBN: 978-3-540-32092-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics