Abstract
Microservices architecture has become the latest trend in building modern applications due to its flexibility, scalability, and agility. However, due to the complex interdependencies between microservices, an anomaly in any one service in a microservice system has the potential to propagate along service dependencies and affect multiple services. Therefore, accurate and efficient root cause localization is a significant challenge for current microservice system operation and maintenance. Focusing on this challenge and leveraging the dynamically constructed service call graph, we propose MicroEGRCL, a root cause localization approach based on graph neural networks with an attention mechanism that includes edge feature enhancement. We conducted an experimental evaluation by injecting various types of service anomalies into two microservice benchmarks running in a Kubernetes cluster. The experimental results demonstrate that MicroEGRCL can achieve an average top1 localization accuracy of 87%, exceeding the state-of-the-art baseline approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Kubernetes - https://kubernetes.io.
- 2.
Prometheus - https://prometheus.io.
- 3.
Chaos-mesh - https://chaos-mesh.org.
References
Brandón, Á., Solé, M., Huélamo, A., Solans, D., Pérez, M.S., Muntés-Mulero, V.: Graph-based root cause analysis for service-oriented and microservice architectures. J. Syst. Softw. 159, 110432 (2020)
Di Francesco, P., Malavolta, I., Lago, P.: Research on architecting microservices: Trends, focus, and potential for industrial adoption. In: 2017 IEEE International Conference on Software Architecture (ICSA), pp. 21–30. IEEE (2017)
Dragoni, N., Lanese, I., Larsen, S.T., Mazzara, M., Mustafin, R., Safina, L.: Microservices: How to make your application scale. In: Petrenko, A.K., Voronkov, A. (eds.) PSI 2017. LNCS, vol. 10742, pp. 95–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74313-4_8
Gan, Y., Zhang, Y., Hu, K., Cheng, D., Delimitrou, C.: Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In: the Twenty-Fourth International Conference (2019)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: 30th Proceedings of the conference on Advances in Neural Information Processing Systems (2017)
Khazaei, H., Barna, C., Beigi-Mohammadi, N., Litoiu, M.: Efficiency analysis of provisioning microservices. In: 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 261–268. IEEE (2016)
Kim, M., Sumbaly, R., Shah, S.: Root cause detection in a service-oriented architecture. ACM SIGMETRICS Perform. Eval. Rev. 41(1), 93–104 (2013)
Lin, J., Chen, P., Zheng, Z.: Microscope: pinpoint performance issues with causal graphs in micro-service environments. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 3–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_1
Liu, D., et al.: Microhecl: High-efficient root cause localization in large-scale microservice systems. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 338–347 (2021). https://doi.org/10.1109/ICSE-SEIP52600.2021.00043
Pham, C., et al.: Failure diagnosis for distributed systems using targeted fault injection. IEEE Trans. Parallel Distrib. Syst. 28(2), 503–516 (2016)
Prewett, J.E.: Analyzing cluster log files using logsurfer. In: Proceedings of the 4th Annual Conference on Linux Clusters. Citeseer (2003)
Wu, L., Tordsson, J., Elmroth, E., Kao, O.: Microrca: root cause localization of performance issues in microservices. In: NOMS 2020–2020 IEEE/IFIP Network Operations and Management Symposium, pp. 1–9. IEEE (2020)
Xu, J., Chen, P., Yang, L., Meng, F., Wang, P.: Logdc: problem diagnosis for declartively-deployed cloud applications with log. In: 2017 IEEE 14th International Conference on e-Business Engineering (ICEBE), pp. 282–287. IEEE (2017)
Zhou, X., et al.: Latent error prediction and fault localization for microservice applications by learning from system trace logs. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 683–694 (2019)
Acknowledgment
This paper was supported by National Key R &D Program of China (Funding No. 2021ZD0110601) and the State Key Laboratory of Software Development Environment (Funding No. SKLSDE-2020ZX-01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, R., Ren, J., Wang, L., Pu, Y., Yang, K., Wu, W. (2022). MicroEGRCL: An Edge-Attention-Based Graph Neural Network Approach for Root Cause Localization in Microservice Systems. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds) Service-Oriented Computing. ICSOC 2022. Lecture Notes in Computer Science, vol 13740. Springer, Cham. https://doi.org/10.1007/978-3-031-20984-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-20984-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20983-3
Online ISBN: 978-3-031-20984-0
eBook Packages: Computer ScienceComputer Science (R0)