Abstract
Authorship verification is a challenging problem in natural language processing. It is crucial in security and forensics, helping identify authors and combat fake news. Recent advancements in neural network models have shown promising results in improving the accuracy of authorship verification. This paper presents a novel model for authorship verification using Siamese networks and evaluates the advantages of transformer-based models over existing methods that rely on domain knowledge and feature engineering. This paper’s objective is to address the authorship verification problem in NLP which entails determining whether two texts were written by the same author by introducing a novel approach that employs Siamese networks with pre-trained BERT and Bi-LSTM layers. The proposed model BiBERT-AV aims to compare the performance of this Siamese network using pre-trained BERT and Bi-LSTM layers against existing methods for authorship verification. The results of this study demonstrate that the proposed Siamese network model BiBERT-AV offers an effective solution for authorship verification that is based solely on the writing style of the author, which outperformed the baselines and state-of-the-art methods. Additionally, our model offers a viable alternative to existing methods that heavily rely on domain knowledge and laborious feature engineering, which often demand significant time and expertise. Notably, the BiBERT-AV model consistently achieves a notable level of accuracy, even when the number of authors is expanded to a larger group. This achievement underscores a notable contrast to the limitations exhibited by the baseline model used in exacting research studies. Overall, this study provides valuable insights into the application of Siamese networks with pre-trained BERT and Bi-LSTM layers for authorship verification and establishes the superiority of the proposed models over existing methods in this domain. The study contributes to the advancement of NLP research and has implications for several real-world applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brocardo, M.L., Traore, I., Saad, S., Woungang, I.: Authorship verification for short messages using stylometry. In: 2013 International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 1–6. IEEE (2013)
Loomba, S., de Figueiredo, A., Piatek, S.J., de Graaf, K., Larson, H.J.: Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nat. Hum. Behav. 5(3), 337–348 (2021)
Bagnall, D.: Author identification using multi-headed recurrent neural networks. arXiv preprint arXiv:1506.04891 (2015)
Araujo-Pino, E., Gómez-Adorno, H., Pineda, G.F.: Siamese network applied to authorship verification. In: CLEF (Working Notes). Working Notes proceedings in CLEF 2020 (2020)
Futrzynski, R.: Author classification as pre-training for pairwise authorship verification. In: CLEF (Working Notes), pp. 1945–1952 (2021)
Tyo, J., Dhingra, B., Lipton, Z.C.: Siamese BERT for authorship verification. In: CLEF (Working Notes), pp. 2169–2177 (2021)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, vol. 6 (1993)
Chicco, D.: Siamese neural networks: an overview. Artif. Neural Netw. 73–94 (2021)
Brocardo, M.L., Traore, I., Woungang, I., Obaidat, M.S.: Authorship verification using deep belief network systems. Int. J. Commun. Syst. 30(12), e3259 (2017)
Halvani, O., Graner, L., Regev, R.: TAVeer: an interpretable topic-agnostic authorship verification method. In: Proceedings of the 15th International Conference on Availability, Reliability and Security, pp. 1–10 (2020)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates Inc. (2019)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (2020)
dataset Enron 2015. Enron email dataset (2015). Accessed 23 June 2023
Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026 (2013)
Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013)
Almutairi, A., Kang, B., Fadhel, N.: The effectiveness of transformer-based models for BEC attack detection. In: Li, S., Manulis, M., Miyaji, A. (eds.) NSS 2023. LNCS, vol. 13983, pp. 77–90. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-39828-5_5
Acknowledgements
The authors would like to thank the Deanship of Scientific Research at Shaqra University and the Saudi Arabian Cultural Bureau in London (SACB) for allowing the research to be undertaken.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Almutairi, A., Kang, B., Nawfal Al Hashimy (2024). BiBERT-AV: Enhancing Authorship Verification Through Siamese Networks with Pre-trained BERT and Bi-LSTM. In: Wang, G., Wang, H., Min, G., Georgalas, N., Meng, W. (eds) Ubiquitous Security. UbiSec 2023. Communications in Computer and Information Science, vol 2034. Springer, Singapore. https://doi.org/10.1007/978-981-97-1274-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-97-1274-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1273-1
Online ISBN: 978-981-97-1274-8
eBook Packages: Computer ScienceComputer Science (R0)