VQC-based reinforcement learning with data re-uploading: performance and trainability

Coelho, Rodrigo; Sequeira, André; Paulo Santos, Luís

doi:10.1007/s42484-024-00190-z

VQC-based reinforcement learning with data re-uploading: performance and trainability

Research Article
Published: 28 August 2024

Volume 6, article number 53, (2024)
Cite this article

Quantum Machine Intelligence Aims and scope Submit manuscript

Rodrigo Coelho^1,2,
André Sequeira^1,2,3 &
Luís Paulo Santos^1,2,3

226 Accesses
4 Altmetric
Explore all metrics

Abstract

Reinforcement learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, has been shown to achieve super-human performance in game-related tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the model’s gradients remain substantial throughout training even as the number of qubits increases. In fact, both increase considerably in the training’s early stages, when the agent needs to learn the most. They decrease later in the training, when the agent should have done most of the learning and started converging to a policy. Thus, even if the probability of being initialized in a Barren Plateau increases exponentially with system size for Hardware-Efficient ansatzes, these results indicate that the VQC-based Deep Q-Learning models may still be able to find large gradients throughout training, allowing for learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Comparing quantum hybrid reinforcement learning to classical methods

Article Open access 12 March 2021

Robustness of quantum reinforcement learning under hardware errors

Article Open access 28 February 2023

Universal quantum control through deep reinforcement learning

Article Open access 23 April 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

All data necessary to replicate the results of this manuscript can be found in https://github.com/RodrigoCoelho7/VQC_Qlearning

References

Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: Aaai/iaai, pp 119–125
Anschuetz ER, Kiani BT (2022) Quantum variational algorithms are swamped with traps. Nature Commun 13(1):7760
Bilkis M, Cerezo M, Verdon G, Coles PJ, Cincio L (2021) A semi-agnostic ansatz with variable structure for quantum machine learning. arXiv preprint arXiv:2103.06712
Busoniu L, Babuska R, De Schutter B, Ernst D (2017) Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC press, ???
Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L et al (2021) Variational quantum algorithms. Nature Reviews. Physics 3(9):625–644
Google Scholar
Cerezo M, Larocca M, García-Martín D, Diaz N, Braccia P, Fontana E, Rudolph MS, Bermejo P, Ijaz A, Thanasilp S et al (2023) Does provable absence of barren plateaus imply classical simulability? or, why we need to rethink variational quantum computing. arXiv preprint arXiv:2312.09121
Cerezo M, Sone A, Volkoff T, Cincio L, Coles PJ (2021) Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nature commun 12(1):1791
Chen SY-C, Yang C-HH, Qi J, Chen P-Y, Ma X, Goan H-S (2020) Variational quantum circuits for deep reinforcement learning. IEEE Access. 8:141007–141024
Article Google Scholar
Coyle B, Mills D, Danos V, Kashefi E (2020) The born supremacy: quantum advantage and training of an ising born machine. npj Quantum Information. 6(1):60
Drudis M, Thanasilp S, Holmes Z et al (2024) Variational quantum simulation: a case study for understanding warm starts. arXiv preprint arXiv:2404.10044
Farhi E, Neven H (2018) Classification with quantum neural networks on near term processors. arXiv preprint arXiv:1802.06002
Grant E, Wossnig L, Ostaszewski M, Benedetti M (2019) An initialization strategy for addressing barren plateaus in parametrized quantum circuits. Quantum 3:214
Article Google Scholar
Hambly B, Xu R, Yang H (2023) Recent advances in reinforcement learning in finance. Math Financ 33(3):437–503
Article MathSciNet Google Scholar
Holmes Z, Sharma K, Cerezo M, Coles PJ (2022) Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quantum. 3(1):010313
Article Google Scholar
Jerbi S, Gyurik C, Marshall S, Briegel HJ, Dunjko V (2021) Variational quantum policies for reinforcement learning. arXiv preprint arXiv:2103.05577
Jones T, Gacon J (2020) Efficient calculation of gradients in classical simulations of variational quantum algorithms. arXiv preprint arXiv:2009.02823
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey. J Artif Intell Res 4:237–285
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: A survey. Int J Robot Res 32(11):1238–1274
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature. 521(7553):436–444
Google Scholar
Lockwood O, Si M (2020) Reinforcement learning with quantum variational circuit. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol 16. pp 245–251
McClean JR, Boixo S, Smelyanskiy VN, Babbush R, Neven H (2018) Barren plateaus in quantum neural network training landscapes. Nat Commun 9(1):1–6
Article Google Scholar
Mele AA, Mbeng GB, Santoro GE, Collura M, Torta P (2022) Avoiding barren plateaus via transferability of smooth solutions in a hamiltonian variational ansatz. Phys Rev A 106(6):060401
Article Google Scholar
Meyer N, Ufrecht C, Periyasamy M, Scherer DD, Plinge A, Mutschler C (2022) A survey on quantum reinforcement learning. arXiv preprint arXiv:2211.03464
Mitarai K, Negoro M, Kitagawa M, Fujii K (2018) Quantum circuit learning. Phys Rev A 98(3):032309
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. nature. 518(7540):529–533
Ostaszewski M, Grant E, Benedetti M (2021) Structure optimization for parameterized quantum circuits. Quantum 5:391
Article Google Scholar
Park C-Y, Killoran N (2024) Hamiltonian variational ansatz without barren plateaus. Quantum 8:1239
Article Google Scholar
Pérez-Salinas A, Cervera-Lierta A, Gil-Fuster E, Latorre JI (2020) Data re-uploading for a universal quantum classifier. Quantum 4:226
Article Google Scholar
Preskill J (2018) Quantum computing in the nisq era and beyond. Quantum 2:79
Article Google Scholar
Sallab AE, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. arXiv preprint arXiv:1704.02532
Schatzki L, Larocca M, Nguyen QT, Sauvage F, Cerezo M (2024) Theoretical guarantees for permutation-equivariant quantum neural networks. npj Quantum Information. 10(1):12
Schuld M, Killoran N (2019) Quantum machine learning in feature hilbert spaces. Phys Rev Lett 122(4):040504
Article Google Scholar
Schuld M, Bergholm V, Gogolin C, Izaac J, Killoran N (2019) Evaluating analytic gradients on quantum hardware. Phys Rev A 99(3):032331
Article Google Scholar
Schuld M, Sweke R, Meyer JJ (2021) Effect of data encoding on the expressive power of variational quantum-machine-learning models. Phys Rev A 103(3):032430
Schuld M, Bocharov A, Svore KM, Wiebe N (2020) Circuit-centric quantum classifiers. Phys Rev A 101(3):032308
Sequeira A, Santos LP, Barbosa LS (2022) Variational quantum policy gradients with an application to quantum control. arXiv preprint arXiv:2203.10591
Shor PW (1999) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev 41(2):303–332
Article MathSciNet Google Scholar
Silver D, Singh S, Precup D, Sutton RS (2021) Reward is enough. Artif Intell 299:103535
Article MathSciNet Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. nature. 529(7587):484–489
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815
Skolik A, Jerbi S, Dunjko V (2022) Quantum agents in the gym: a variational quantum algorithm for deep q-learning. Quantum 6:720
Article Google Scholar
Skolik A, Cattelan M, Yarkoni S, Bäck T, Dunjko V (2023) Equivariant quantum circuits for learning on weighted graphs. npj Quantum Inf 9(1):47
Sorzano COS, Vargas J, Montano AP (2014) A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. MIT press, ???
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292
Zhang K, Liu L, Hsieh M-H, Tao D (2022) Escaping from the barren plateau via gaussian initializations in deep variational quantum circuits. Adv Neural Inf Process Syst 35:18612–18627
Google Scholar
Zoufal C, Lucchi A, Woerner S (2021) Variational quantum boltzmann machines. Quantum Mach Intell 3:1–15

Download references

Funding

This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project LA/P/0063/2020 (https://doi.org/10.54499/LA/P/0063/2020). RC also thanks the support of the Foundation for Science and Technology (FCT, Portugal) under grant 10053/BII-E_B4/2023. AS also thanks the support of the Foundation for Science and Technology (FCT, Portugal) within grant UI/BD/152698/2022 and project IBEX, with reference PTDC/CC1-COM/4280/2021.

Author information

Authors and Affiliations

Department of Informatics, University of Minho, Braga, Portugal
Rodrigo Coelho, André Sequeira & Luís Paulo Santos
HASLab, INESC TEC, Braga, Portugal
Rodrigo Coelho, André Sequeira & Luís Paulo Santos
International Nanotechnology Laboratory (INL), Braga, Portugal
André Sequeira & Luís Paulo Santos

Authors

Rodrigo Coelho
View author publications
You can also search for this author in PubMed Google Scholar
André Sequeira
View author publications
You can also search for this author in PubMed Google Scholar
Luís Paulo Santos
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.C. implemented the code and wrote the main manuscript text, under the supervision of A.S. and L.P.S. All authors reviewed the manuscript.

Corresponding author

Correspondence to Rodrigo Coelho.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Hyperparameters

Some of the hyperparameters for the VQC-based Deep Q-Learning models are explained in the following table:

The set of hyperparameters used for the VQCs in Figs. 3a and b and 4a and b, can be seen in Table 3.

Table 2 An explanation of VQC-Based Deep Q-Learning’s hyperparameters

Full size table

Table 3 Models’ hyperparameters of Section 4.3

Full size table

The set of hyperparameters used for the VQCs in Fig. 5a and b can be seen in Table 4.

The set of hyperparameters used for the VQCs in Fig. 7a and b can be seen in Table 5.

The set of hyperparameters used for the VQCs in Fig. 6a and b can be seen in Table 6.

The set of hyperparameters used for the VQCs in Figs. 10 and 11 can be seen in Table 6.

Table 4 Models’ hyperparameters of Section 4.5

Full size table

Table 5 Models’ hyperparameters of Section 4.6

Full size table

Table 6 Models’ hyperparameters of Section 4.7

Full size table

Table 7 Models’ hyperparameters of Section 4.7 — Fig. 8

Full size table

Table 8 Models’ hyperparameters of Section 4.8

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Coelho, R., Sequeira, A. & Paulo Santos, L. VQC-based reinforcement learning with data re-uploading: performance and trainability. Quantum Mach. Intell. 6, 53 (2024). https://doi.org/10.1007/s42484-024-00190-z

Download citation

Received: 28 January 2024
Accepted: 09 August 2024
Published: 28 August 2024
DOI: https://doi.org/10.1007/s42484-024-00190-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

VQC-based reinforcement learning with data re-uploading: performance and trainability

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparing quantum hybrid reinforcement learning to classical methods

Robustness of quantum reinforcement learning under hardware errors

Universal quantum control through deep reinforcement learning

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher's Note

Appendix: Hyperparameters

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

VQC-based reinforcement learning with data re-uploading: performance and trainability

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparing quantum hybrid reinforcement learning to classical methods

Robustness of quantum reinforcement learning under hardware errors

Universal quantum control through deep reinforcement learning

Explore related subjects

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher's Note

Appendix: Hyperparameters

Appendix: Hyperparameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation