Abstract
We present a novel music generation framework for music infilling, with a user friendly interface. Infilling refers to the task of generating musical sections given the surrounding multi-track music. The proposed transformer-based framework is extensible for new control tokens as the added music control tokens such as tonal tension per bar and track polyphony level in this work. We explore the effects of including several musically meaningful control tokens, and evaluate the results using objective metrics related to pitch and rhythm. Our results demonstrate that adding additional control tokens helps to generate music with stronger stylistic similarities to the original music. It also provides the user with more control to change properties like the music texture and tonal tension in each bar compared to previous research which only provided control for track density. We present the model in a Google Colab notebook to enable interactive generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akama, T.: A contextual latent space model: subsequence modulation in melodic sequence. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 27–34 (2021)
Bazin, T., Hadjeres, G.: NONOTO: a model-agnostic web interface for interactive music composition by inpainting. arXiv:1907.10380 (2019)
Briot, J.P., Hadjeres, G., Pachet, F.: Deep Learning Techniques for Music Generation. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-319-70163-9
Brunner, G., Wang, Y., Wattenhofer, R., Zhao, S.: Symbolic music genre transfer with CycleGAN. In: 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI), Volos, Greece, pp. 786–793 (2018)
Chew, E.: The spiral array: an algorithm for determining key boundaries. In: Anagnostopoulou, C., Ferrand, M., Smaill, A. (eds.) ICMAI 2002. LNCS (LNAI), vol. 2445, pp. 18–31. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45722-4_4
Chou, Y.H., Chen, I., Chang, C.J., Ching, J., Yang, Y.H., et al.: MidiBERT-piano: large-scale pre-training for symbolic music understanding. arXiv:2107.05223 (2021)
Cuthbert, M.S., Ariza, C.: music21: a toolkit for computer-aided musicology and symbolic music data. In: Proceedings of the 11th International Society for Music Information Retrieval Conference, Utrecht, Netherlands, pp. 637–642 (2010)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2021)
Ens, J., Pasquier, P.: MMM: exploring conditional multi-track music generation with the transformer. arXiv:2008.06048 (2020)
Guo, R., Herremans, D., Magnusson, T.: Midi miner - a Python library for tonal tension and track classification. arXiv:1910.02049 (2019)
Guo, R., Simpson, I., Magnusson, T., Kiefer, C., Herremans, D.: A variational autoencoder for music generation controlled by tonal tension. arXiv preprint arXiv:2010.06230 (2020)
Hadjeres, G., Crestel, L.: The piano inpainting application. arXiv:2107.05944 (2021)
Herremans, D., Chew, E.: Tension ribbons: quantifying and visualising tonal tension. In: 2nd International Conference on Technologies for Music Notation and Representation, Cambridge, UK, pp. 8–18 (2016)
Herremans, D., Chew, E.: MorpheuS: generating structured music with constrained patterns and tension. IEEE Trans. Affect. Comput. 10(4), 510–523 (2017)
Hsiao, W.Y., Liu, J.Y., Yeh, Y.C., Yang, Y.H.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. arXiv:2101.02402 (2021)
Huang, C.A., Cooijmans, T., Roberts, A., Courville, A.C., Eck, D.: Counterpoint by convolution. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, pp. 211–218 (2017)
Huang, C.A., et al.: Music transformer: generating music with long-term structure. In: 7th International Conference on Learning Representations, New Orleans, USA (2019)
Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, Seattle, USA, pp. 1180–1188 (2020)
Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation: multi-level representations, algorithms, evaluations, and future directions. arXiv:2011.06801 (2020)
Louie, R., Coenen, A., Huang, C.Z., Terry, M., Cai, C.J.: Novice-AI music co-creation via AI-steering tools for deep generative models. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, USA, pp. 1–13 (2020)
Muhamed, A., et al.: Transformer-GAN: symbolic music generation using a learned loss. In: 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020 (2020)
Oore, S., Simon, I., Dieleman, S., Eck, D., Simonyan, K.: This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32(4), 955–967 (2020)
Pati, A., Lerch, A.: Is disentanglement enough? On latent representations for controllable music generation. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 517–524 (2021)
Pati, A., Lerch, A., Hadjeres, G.: Learning to traverse latent spaces for musical score inpainting. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, pp. 343–351 (2019)
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Ren, Y., He, J., Tan, X., Qin, T., Zhao, Z., Liu, T.Y.: PopMAG: pop music accompaniment generation. In: Proceedings of the 28th ACM International Conference on Multimedia, Seattle, USA, pp. 1198–1206 (2020)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.: MASS: masked sequence to sequence pre-training for language generation. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, vol. 97, pp. 5926–5936 (2019)
Tan, H.H., Herremans, D.: Music FaderNets: controllable music generation based on high-level features via low-level feature modelling. In: Proceedings of the 21st International Society for Music Information Retrieval Conference, Montréal, Canada, pp. 109–116 (2020)
Tatar, K., Pasquier, P.: Musical agents: a typology and state of the art towards musical metacreation. J. New Music Res. 48, 105–56 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv:1706.03762 (2017)
Yang, L.C., Lerch, A.: On the evaluation of generative models in music. Neural Comput. Appl. 32(9), 4773–4784 (2020)
Zeng, M., Tan, X., Wang, R., Ju, Z., Qin, T., Liu, T.Y.: MusicBERT: symbolic music understanding with large-scale pre-training. arXiv:2106.05630 (2021)
Zixun, G., Makris, D., Herremans, D.: Hierarchical recurrent neural networks for conditional melody generation with long-term structure. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Acknowledgement
This work is funded by Chinese scholarship Council and Singapore Ministry of Education Grant no. MOE2018-T2-2-161.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Guo, R., Simpson, I., Kiefer, C., Magnusson, T., Herremans, D. (2022). MusIAC: An Extensible Generative Framework for Music Infilling Applications with Multi-level Control. In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2022. Lecture Notes in Computer Science, vol 13221. Springer, Cham. https://doi.org/10.1007/978-3-031-03789-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-03789-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-03788-7
Online ISBN: 978-3-031-03789-4
eBook Packages: Computer ScienceComputer Science (R0)