Abstract
Deep generative models such as a family of GPT have exhibited super-human performance in natural language generation. However, the evaluation of the generated lacks the automated solutions and mostly requires human involved manual experiments. This paper explores the possibility of a computational means to evaluate the generated contents in an automated way. We in particular conducted the experiment with stylised lyrics which requires careful consideration in the evaluation since the lyrics generation takes into account individual characteristics of artists. To this end, we first carried out the lyrics generation through fine-tuning with K-Pop songs in three different genres using the KoGPT-2 to effectively transfer the individual artists’ persona and style. Afterwards we conducted the evaluation of stylised lyrics with another deep generative model, BERT, to measure the similarity between the lyrics generated and that in the training data, both within and between artists. The results showed the highest score between the generated and the original lyrics within the same artist but lower similarity than that between the artists, which the phenomena was not captured in a typical evaluation metric such as BLEU. Although this is a preliminary approach, this shows a possibility to automatically evaluate the generated contents in which individual characteristics were infused without human effort.
H.-J. Hong and S.-H. Kim—These authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shawar, B.A., Atwell, E.: Different measurement metrics to evaluate a chatbot system. In: Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, pp. 89–96 (2007)
Nagarhalli, T.P., Vaze, V., Rana, N.K.: A review of current trends in the development of chatbot systems. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 706–710. IEEE (2020)
Report of chatbot market size. https://www.grandviewresearch.com/industry-analysis/chatbot-market
Chandel, S., Yuying, Y., Yujie, G., Razaque, A., Yang, G.: Chatbot: efficient and utility-based platform. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Intelligent Computing, vol. 858, pp. 109–122. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01174-1_9
Pradhan, A., Lazar, A.: Hey Google, do you have a personality? Designing personality and personas for conversational agents. In: CUI 2021–3rd Conference on Conversational User Interfaces, pp. 1–4 (2021)
Zheng, Y., et al.: A pre-training based personalized dialogue generation model with persona-sparse data. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9693–9700 (2020)
Lee, S.K., Yun, J.Y.: A convergence study on chatbot persona and user experience of financial service - focused on loan service. Korean Soc. Sci. Art 37(4), 257–267 (2019)
KoGPT2. https://github.com/SKT-AI/KoGPT2
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint https://arxiv.org/abs/1810.04805 (2018)
Hong, H.-J., Kim, S.-H., Lee, J.H.: Engineering a deep-generative model for lyric writing based upon a style transfer of song writers. In: Proceedings of the Korea Information Processing Society Conference. Korea Information Processing Society, pp. 741–744 (2021)
Papineni, K., et al.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Zhang, T., et al.: Bertscore: evaluating text generation with bert. arXiv preprint arXiv:1904.09675 https://arxiv.org/abs/1904.09675 (2019)
Acknowledgement
This research was supported by (i) the Samsung Research Funding Center of Samsung Electronics under Project Number No. SRFC-TC1603-52, and (ii) the National Research Foundation of Korea (NRF) grant funded by the Korean government (No. 2020R1G1A1102683).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Comparing the original lyrics with the generated lyrics, Sunwoojunga has a similar structure in which English words are inserted in the middle and the same English sentences are repeated, as shown. In the case of IU, lyrics with the same ending in ‘- 요’ are being generated. Even in the case of Monsta X, both the original lyrics and the generated lyrics have a structure in which English words are included in the middle and the same English sentence structure is repeated. In conclusion, when the generated lyrics and original lyrics are compared, structurally similar lyrics are generated (see Table 2).
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hong, HJ., Kim, SH., Lee, JH. (2023). On the Evaluation of Generated Stylised Lyrics Using Deep Generative Models: A Preliminary Study. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-27199-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)