Optimizing Software Release Management with GPT-Enabled Log Anomaly Detection | SpringerLink
Skip to main content

Optimizing Software Release Management with GPT-Enabled Log Anomaly Detection

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15302))

Included in the following conference series:

  • 171 Accesses

Abstract

In the complex world of software systems, understanding and maintaining system stability and performance is of utmost significance. Finding anomalies in log data has become increasingly difficult due to these systems’ growing complexity. Motivated by the need to improve software release management and ensure system reliability, this study exploits Generative Pretrained Transformer (GPT)-3’s advanced word embedding and tokenizer functionalities to convert log data to adept at identifying atypical patterns and anomalies, delineated in a two-layered structure: offline and online layers. In the offline layer, historical log data undergoes processing through the GPT model, where it is divided into sentence and word embeddings. Sentence embeddings are clustered to generate labels and taggers for subsequent stages, while word embeddings directly create taggers for the online layer’s sequence labeling. The online layer involves collecting real-time data, processing it through GPT to generate embeddings, and subjecting these embeddings to a sequence labeling process. This process yields templates and variables expediting the formation of train-test data splits for a classifier that detects anomalies. Different classifiers, namely Random Forest (RF), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are evaluated. Experimental analysis on four distinct real-world datasets, namely Apache, BlueGene/L (BGL), Hadoop Distributed File System (HDFS), and Thunderbird, where CatBoost achieved remarkable accuracy rates of 99.75%, 99.00%, 98.75%, and 99.33%, respectively. The study also demonstrates that GPT-based embeddings provide a more effective anomaly detection solution than Bidirectional Encoder Representations from Transformers (BERT)-based embeddings. The proposed methodology is particularly designed to be integrated into software release management processes which enables automatic anomaly detection to augment quality control measures, thereby, expediting timely intervention.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8007
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10009
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Le, V.-H., Zhang, H.: Log-based anomaly detection with deep learning: how far are we? In: Proceedings of the 44th International Conference on Software Engineering, pp. 1356–1367. IEEE (2022)

    Google Scholar 

  2. Naseer, S., et al.: Enhanced network anomaly detection based on deep neural networks. IEEE Access 6, 48231–48246 (2018)

    Article  Google Scholar 

  3. Cao, Q., Qiao, Y., Lyu, Z.: Machine learning to detect anomalies in web log analysis. In: 2017 3rd IEEE international conference on computer and communications (ICCC), pp. 519–523. IEEE (2017)

    Google Scholar 

  4. Tziolas, T., Papageorgiou, K., Theodosiou, T., Papageorgiou, E., Mastos, T., Papadopoulos, A.: Autoencoders for anomaly detection in an industrial multivariate time series dataset. Eng. Proc. 18(1), 23 (2022)

    Google Scholar 

  5. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems , vol. 30 (2017)

    Google Scholar 

  6. Zhao, Z., Niu, W., Zhang, X., Zhang, R., Yu, Z., Huang, C.: Trine: syslog anomaly detection with three transformer encoders in one generative adversarial network. Appl. Intell., 1–10 (2021). https://doi.org/10.1007/s10489-021-02863-9

  7. Huang, S., Liu, Y., Fung, C., Wang, H., Yang, H., Luan, Z.: Improving log-based anomaly detection by pre-training hierarchical transformers. IEEE Trans. Comput. 72, 2656–2667 (2023)

    Article  Google Scholar 

  8. Santosa, I., Mulyana, R.: The IT services management architecture design for large and medium-sized companies based on ITIL 4 and TOGAF framework. JOIV: Int. J. Inform. Vis. 7(1), 30–36 (2023)

    Google Scholar 

  9. Yu, G., et al: LogReducer: identify and reduce log hotspots in kernel on the fly. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 1763–1775. IEEE (2023)

    Google Scholar 

  10. Abbas, M., Hamayouni, A., Moghadam, M.H., Saadatmand, M., Strandberg, P.E.: Making Sense of Failure Logs in an Industrial DevOps Environment. In: Latifi, S. (eds.) International Conference on Information Technology-New Generations, pp. 217–226. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28332-1_25

  11. Kauffman, S.: Log analysis and system monitoring with NFER. Sci. Comput. Program. 225, 102909 (2023)

    Article  Google Scholar 

  12. Zanella, R., Welch, B., Mendelsohn, M., Korte, B.: Enterprise Log Managers: An Unsexy. But Vital, Tool-Global Security Mag Online (2023)

    Google Scholar 

  13. Meng, W., et al.: LogSummary: unstructured log summarization for software systems. IEEE Trans. Netw. Serv. Manag. 20, 3803–3815 (2023)

    Article  Google Scholar 

  14. Li, M., Sun, M., Li, G., Han, D., Zhou, M.: MDFULog: multi-feature deep fusion of unstable log anomaly detection model. Appl. Sci. 13(4), 2237 (2023)

    Article  Google Scholar 

  15. Zhang, X., et al.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 807–817 (2019)

    Google Scholar 

  16. Nedelkoski, S., Bogatinovski, J., Acker, A., Cardoso, J., Kao, O.: Self-attentive classification-based anomaly detection in unstructured logs. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1196–1201. IEEE (2020)

    Google Scholar 

  17. Guo, H., Yuan, S., Wu, X.: LogBERT: log anomaly detection via BERT. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

  18. Wang, Z., Tian, J., Fang, H., Chen, L., Qin, J.: LightLog: a lightweight temporal convolutional network for log anomaly detection on the edge. Comput. Netw. 203, 108616 (2022)

    Article  Google Scholar 

  19. Wang, J., Zhao, C., He, S., Gu, Y., Alfarraj, O., Abugabah, A.: LogUAD: log unsupervised anomaly detection based on Word2Vec. Comput. Syst. Sci. Eng. 41(3), 1207 (2022)

    Article  Google Scholar 

  20. Borders, T.L., Volkova, S.: An introduction to word embeddings and language models (No. INL/EXT-21-61935-Rev000). Idaho National Lab.(INL), Idaho Falls, ID, United States (2021)

    Google Scholar 

  21. Kusumaningrum, R., Khoerunnisa, S.F., Khadijah, K., Syafrudin, M.: Exploring community awareness of mangrove ecosystem preservation through sentence-BERT and K-Means clustering. Information 15(3), 165 (2024)

    Article  Google Scholar 

  22. Fan, J., Huang, L., Gong, C., You, Y., Gan, M., Wang, Z.: KMT-PLL: K-means cross-attention transformer for partial label learning. IEEE Trans. Neural Netw. Learn. Syst., 1–2 (2024)

    Google Scholar 

  23. Tehseen, A., Ehsan, T., Liaqat, H.B., Ali, A., Al-Fuqaha, A.: Neural POS tagging of shahmukhi by using contextualized word representations. J. King Saud Univ. Comput. Inf. Sci. 35(1), 335–356 (2023)

    Google Scholar 

  24. Zhang, C., et al.: LayerLog: log sequence anomaly detection based on hierarchical semantics. Appl. Soft Comput. 132, 109860 (2023)

    Article  Google Scholar 

  25. Zhu, J., He, S., He, P., Liu, J., Lyu, M.R.: Loghub: a large collection of system log datasets for AI-driven log analytics. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 355–366. IEEE (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashraful Islam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Setu, J.H., Hossain, M.S., Halder, N., Islam, A., Amin, M.A. (2025). Optimizing Software Release Management with GPT-Enabled Log Anomaly Detection. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15302. Springer, Cham. https://doi.org/10.1007/978-3-031-78166-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78166-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78165-0

  • Online ISBN: 978-3-031-78166-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics