Abstract
In the complex world of software systems, understanding and maintaining system stability and performance is of utmost significance. Finding anomalies in log data has become increasingly difficult due to these systems’ growing complexity. Motivated by the need to improve software release management and ensure system reliability, this study exploits Generative Pretrained Transformer (GPT)-3’s advanced word embedding and tokenizer functionalities to convert log data to adept at identifying atypical patterns and anomalies, delineated in a two-layered structure: offline and online layers. In the offline layer, historical log data undergoes processing through the GPT model, where it is divided into sentence and word embeddings. Sentence embeddings are clustered to generate labels and taggers for subsequent stages, while word embeddings directly create taggers for the online layer’s sequence labeling. The online layer involves collecting real-time data, processing it through GPT to generate embeddings, and subjecting these embeddings to a sequence labeling process. This process yields templates and variables expediting the formation of train-test data splits for a classifier that detects anomalies. Different classifiers, namely Random Forest (RF), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are evaluated. Experimental analysis on four distinct real-world datasets, namely Apache, BlueGene/L (BGL), Hadoop Distributed File System (HDFS), and Thunderbird, where CatBoost achieved remarkable accuracy rates of 99.75%, 99.00%, 98.75%, and 99.33%, respectively. The study also demonstrates that GPT-based embeddings provide a more effective anomaly detection solution than Bidirectional Encoder Representations from Transformers (BERT)-based embeddings. The proposed methodology is particularly designed to be integrated into software release management processes which enables automatic anomaly detection to augment quality control measures, thereby, expediting timely intervention.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Le, V.-H., Zhang, H.: Log-based anomaly detection with deep learning: how far are we? In: Proceedings of the 44th International Conference on Software Engineering, pp. 1356–1367. IEEE (2022)
Naseer, S., et al.: Enhanced network anomaly detection based on deep neural networks. IEEE Access 6, 48231–48246 (2018)
Cao, Q., Qiao, Y., Lyu, Z.: Machine learning to detect anomalies in web log analysis. In: 2017 3rd IEEE international conference on computer and communications (ICCC), pp. 519–523. IEEE (2017)
Tziolas, T., Papageorgiou, K., Theodosiou, T., Papageorgiou, E., Mastos, T., Papadopoulos, A.: Autoencoders for anomaly detection in an industrial multivariate time series dataset. Eng. Proc. 18(1), 23 (2022)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems , vol. 30 (2017)
Zhao, Z., Niu, W., Zhang, X., Zhang, R., Yu, Z., Huang, C.: Trine: syslog anomaly detection with three transformer encoders in one generative adversarial network. Appl. Intell., 1–10 (2021). https://doi.org/10.1007/s10489-021-02863-9
Huang, S., Liu, Y., Fung, C., Wang, H., Yang, H., Luan, Z.: Improving log-based anomaly detection by pre-training hierarchical transformers. IEEE Trans. Comput. 72, 2656–2667 (2023)
Santosa, I., Mulyana, R.: The IT services management architecture design for large and medium-sized companies based on ITIL 4 and TOGAF framework. JOIV: Int. J. Inform. Vis. 7(1), 30–36 (2023)
Yu, G., et al: LogReducer: identify and reduce log hotspots in kernel on the fly. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 1763–1775. IEEE (2023)
Abbas, M., Hamayouni, A., Moghadam, M.H., Saadatmand, M., Strandberg, P.E.: Making Sense of Failure Logs in an Industrial DevOps Environment. In: Latifi, S. (eds.) International Conference on Information Technology-New Generations, pp. 217–226. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28332-1_25
Kauffman, S.: Log analysis and system monitoring with NFER. Sci. Comput. Program. 225, 102909 (2023)
Zanella, R., Welch, B., Mendelsohn, M., Korte, B.: Enterprise Log Managers: An Unsexy. But Vital, Tool-Global Security Mag Online (2023)
Meng, W., et al.: LogSummary: unstructured log summarization for software systems. IEEE Trans. Netw. Serv. Manag. 20, 3803–3815 (2023)
Li, M., Sun, M., Li, G., Han, D., Zhou, M.: MDFULog: multi-feature deep fusion of unstable log anomaly detection model. Appl. Sci. 13(4), 2237 (2023)
Zhang, X., et al.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 807–817 (2019)
Nedelkoski, S., Bogatinovski, J., Acker, A., Cardoso, J., Kao, O.: Self-attentive classification-based anomaly detection in unstructured logs. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1196–1201. IEEE (2020)
Guo, H., Yuan, S., Wu, X.: LogBERT: log anomaly detection via BERT. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Wang, Z., Tian, J., Fang, H., Chen, L., Qin, J.: LightLog: a lightweight temporal convolutional network for log anomaly detection on the edge. Comput. Netw. 203, 108616 (2022)
Wang, J., Zhao, C., He, S., Gu, Y., Alfarraj, O., Abugabah, A.: LogUAD: log unsupervised anomaly detection based on Word2Vec. Comput. Syst. Sci. Eng. 41(3), 1207 (2022)
Borders, T.L., Volkova, S.: An introduction to word embeddings and language models (No. INL/EXT-21-61935-Rev000). Idaho National Lab.(INL), Idaho Falls, ID, United States (2021)
Kusumaningrum, R., Khoerunnisa, S.F., Khadijah, K., Syafrudin, M.: Exploring community awareness of mangrove ecosystem preservation through sentence-BERT and K-Means clustering. Information 15(3), 165 (2024)
Fan, J., Huang, L., Gong, C., You, Y., Gan, M., Wang, Z.: KMT-PLL: K-means cross-attention transformer for partial label learning. IEEE Trans. Neural Netw. Learn. Syst., 1–2 (2024)
Tehseen, A., Ehsan, T., Liaqat, H.B., Ali, A., Al-Fuqaha, A.: Neural POS tagging of shahmukhi by using contextualized word representations. J. King Saud Univ. Comput. Inf. Sci. 35(1), 335–356 (2023)
Zhang, C., et al.: LayerLog: log sequence anomaly detection based on hierarchical semantics. Appl. Soft Comput. 132, 109860 (2023)
Zhu, J., He, S., He, P., Liu, J., Lyu, M.R.: Loghub: a large collection of system log datasets for AI-driven log analytics. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 355–366. IEEE (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Setu, J.H., Hossain, M.S., Halder, N., Islam, A., Amin, M.A. (2025). Optimizing Software Release Management with GPT-Enabled Log Anomaly Detection. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15302. Springer, Cham. https://doi.org/10.1007/978-3-031-78166-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-78166-7_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78165-0
Online ISBN: 978-3-031-78166-7
eBook Packages: Computer ScienceComputer Science (R0)