MusIAC: An Extensible Generative Framework for Music Infilling Applications with Multi-level Control | SpringerLink
Skip to main content

MusIAC: An Extensible Generative Framework for Music Infilling Applications with Multi-level Control

  • Conference paper
  • First Online:
Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART 2022)

Abstract

We present a novel music generation framework for music infilling, with a user friendly interface. Infilling refers to the task of generating musical sections given the surrounding multi-track music. The proposed transformer-based framework is extensible for new control tokens as the added music control tokens such as tonal tension per bar and track polyphony level in this work. We explore the effects of including several musically meaningful control tokens, and evaluate the results using objective metrics related to pitch and rhythm. Our results demonstrate that adding additional control tokens helps to generate music with stronger stylistic similarities to the original music. It also provides the user with more control to change properties like the music texture and tonal tension in each bar compared to previous research which only provided control for track density. We present the model in a Google Colab notebook to enable interactive generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14871
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 18589
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/ruiguo-bio/MusIAC.

References

  1. Akama, T.: A contextual latent space model: subsequence modulation in melodic sequence. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 27–34 (2021)

    Google Scholar 

  2. Bazin, T., Hadjeres, G.: NONOTO: a model-agnostic web interface for interactive music composition by inpainting. arXiv:1907.10380 (2019)

  3. Briot, J.P., Hadjeres, G., Pachet, F.: Deep Learning Techniques for Music Generation. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-319-70163-9

    Book  Google Scholar 

  4. Brunner, G., Wang, Y., Wattenhofer, R., Zhao, S.: Symbolic music genre transfer with CycleGAN. In: 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI), Volos, Greece, pp. 786–793 (2018)

    Google Scholar 

  5. Chew, E.: The spiral array: an algorithm for determining key boundaries. In: Anagnostopoulou, C., Ferrand, M., Smaill, A. (eds.) ICMAI 2002. LNCS (LNAI), vol. 2445, pp. 18–31. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45722-4_4

    Chapter  Google Scholar 

  6. Chou, Y.H., Chen, I., Chang, C.J., Ching, J., Yang, Y.H., et al.: MidiBERT-piano: large-scale pre-training for symbolic music understanding. arXiv:2107.05223 (2021)

  7. Cuthbert, M.S., Ariza, C.: music21: a toolkit for computer-aided musicology and symbolic music data. In: Proceedings of the 11th International Society for Music Information Retrieval Conference, Utrecht, Netherlands, pp. 637–642 (2010)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2021)

  9. Ens, J., Pasquier, P.: MMM: exploring conditional multi-track music generation with the transformer. arXiv:2008.06048 (2020)

  10. Guo, R., Herremans, D., Magnusson, T.: Midi miner - a Python library for tonal tension and track classification. arXiv:1910.02049 (2019)

  11. Guo, R., Simpson, I., Magnusson, T., Kiefer, C., Herremans, D.: A variational autoencoder for music generation controlled by tonal tension. arXiv preprint arXiv:2010.06230 (2020)

  12. Hadjeres, G., Crestel, L.: The piano inpainting application. arXiv:2107.05944 (2021)

  13. Herremans, D., Chew, E.: Tension ribbons: quantifying and visualising tonal tension. In: 2nd International Conference on Technologies for Music Notation and Representation, Cambridge, UK, pp. 8–18 (2016)

    Google Scholar 

  14. Herremans, D., Chew, E.: MorpheuS: generating structured music with constrained patterns and tension. IEEE Trans. Affect. Comput. 10(4), 510–523 (2017)

    Article  Google Scholar 

  15. Hsiao, W.Y., Liu, J.Y., Yeh, Y.C., Yang, Y.H.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. arXiv:2101.02402 (2021)

  16. Huang, C.A., Cooijmans, T., Roberts, A., Courville, A.C., Eck, D.: Counterpoint by convolution. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, pp. 211–218 (2017)

    Google Scholar 

  17. Huang, C.A., et al.: Music transformer: generating music with long-term structure. In: 7th International Conference on Learning Representations, New Orleans, USA (2019)

    Google Scholar 

  18. Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, Seattle, USA, pp. 1180–1188 (2020)

    Google Scholar 

  19. Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation: multi-level representations, algorithms, evaluations, and future directions. arXiv:2011.06801 (2020)

  20. Louie, R., Coenen, A., Huang, C.Z., Terry, M., Cai, C.J.: Novice-AI music co-creation via AI-steering tools for deep generative models. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, USA, pp. 1–13 (2020)

    Google Scholar 

  21. Muhamed, A., et al.: Transformer-GAN: symbolic music generation using a learned loss. In: 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020 (2020)

    Google Scholar 

  22. Oore, S., Simon, I., Dieleman, S., Eck, D., Simonyan, K.: This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32(4), 955–967 (2020)

    Article  Google Scholar 

  23. Pati, A., Lerch, A.: Is disentanglement enough? On latent representations for controllable music generation. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 517–524 (2021)

    Google Scholar 

  24. Pati, A., Lerch, A., Hadjeres, G.: Learning to traverse latent spaces for musical score inpainting. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, pp. 343–351 (2019)

    Google Scholar 

  25. Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)

    Google Scholar 

  26. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    MathSciNet  MATH  Google Scholar 

  27. Ren, Y., He, J., Tan, X., Qin, T., Zhao, Z., Liu, T.Y.: PopMAG: pop music accompaniment generation. In: Proceedings of the 28th ACM International Conference on Multimedia, Seattle, USA, pp. 1198–1206 (2020)

    Google Scholar 

  28. Song, K., Tan, X., Qin, T., Lu, J., Liu, T.: MASS: masked sequence to sequence pre-training for language generation. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, vol. 97, pp. 5926–5936 (2019)

    Google Scholar 

  29. Tan, H.H., Herremans, D.: Music FaderNets: controllable music generation based on high-level features via low-level feature modelling. In: Proceedings of the 21st International Society for Music Information Retrieval Conference, Montréal, Canada, pp. 109–116 (2020)

    Google Scholar 

  30. Tatar, K., Pasquier, P.: Musical agents: a typology and state of the art towards musical metacreation. J. New Music Res. 48, 105–56 (2019)

    Article  Google Scholar 

  31. Vaswani, A., et al.: Attention is all you need. arXiv:1706.03762 (2017)

  32. Yang, L.C., Lerch, A.: On the evaluation of generative models in music. Neural Comput. Appl. 32(9), 4773–4784 (2020)

    Article  Google Scholar 

  33. Zeng, M., Tan, X., Wang, R., Ju, Z., Qin, T., Liu, T.Y.: MusicBERT: symbolic music understanding with large-scale pre-training. arXiv:2106.05630 (2021)

  34. Zixun, G., Makris, D., Herremans, D.: Hierarchical recurrent neural networks for conditional melody generation with long-term structure. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

Download references

Acknowledgement

This work is funded by Chinese scholarship Council and Singapore Ministry of Education Grant no. MOE2018-T2-2-161.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Guo .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 4694 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, R., Simpson, I., Kiefer, C., Magnusson, T., Herremans, D. (2022). MusIAC: An Extensible Generative Framework for Music Infilling Applications with Multi-level Control. In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2022. Lecture Notes in Computer Science, vol 13221. Springer, Cham. https://doi.org/10.1007/978-3-031-03789-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-03789-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-03788-7

  • Online ISBN: 978-3-031-03789-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics