Abstract
Reinforcement learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, has been shown to achieve super-human performance in game-related tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the model’s gradients remain substantial throughout training even as the number of qubits increases. In fact, both increase considerably in the training’s early stages, when the agent needs to learn the most. They decrease later in the training, when the agent should have done most of the learning and started converging to a policy. Thus, even if the probability of being initialized in a Barren Plateau increases exponentially with system size for Hardware-Efficient ansatzes, these results indicate that the VQC-based Deep Q-Learning models may still be able to find large gradients throughout training, allowing for learning.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
All data necessary to replicate the results of this manuscript can be found in https://github.com/RodrigoCoelho7/VQC_Qlearning
References
Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: Aaai/iaai, pp 119–125
Anschuetz ER, Kiani BT (2022) Quantum variational algorithms are swamped with traps. Nature Commun 13(1):7760
Bilkis M, Cerezo M, Verdon G, Coles PJ, Cincio L (2021) A semi-agnostic ansatz with variable structure for quantum machine learning. arXiv preprint arXiv:2103.06712
Busoniu L, Babuska R, De Schutter B, Ernst D (2017) Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC press, ???
Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L et al (2021) Variational quantum algorithms. Nature Reviews. Physics 3(9):625–644
Cerezo M, Larocca M, García-Martín D, Diaz N, Braccia P, Fontana E, Rudolph MS, Bermejo P, Ijaz A, Thanasilp S et al (2023) Does provable absence of barren plateaus imply classical simulability? or, why we need to rethink variational quantum computing. arXiv preprint arXiv:2312.09121
Cerezo M, Sone A, Volkoff T, Cincio L, Coles PJ (2021) Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nature commun 12(1):1791
Chen SY-C, Yang C-HH, Qi J, Chen P-Y, Ma X, Goan H-S (2020) Variational quantum circuits for deep reinforcement learning. IEEE Access. 8:141007–141024
Coyle B, Mills D, Danos V, Kashefi E (2020) The born supremacy: quantum advantage and training of an ising born machine. npj Quantum Information. 6(1):60
Drudis M, Thanasilp S, Holmes Z et al (2024) Variational quantum simulation: a case study for understanding warm starts. arXiv preprint arXiv:2404.10044
Farhi E, Neven H (2018) Classification with quantum neural networks on near term processors. arXiv preprint arXiv:1802.06002
Grant E, Wossnig L, Ostaszewski M, Benedetti M (2019) An initialization strategy for addressing barren plateaus in parametrized quantum circuits. Quantum 3:214
Hambly B, Xu R, Yang H (2023) Recent advances in reinforcement learning in finance. Math Financ 33(3):437–503
Holmes Z, Sharma K, Cerezo M, Coles PJ (2022) Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quantum. 3(1):010313
Jerbi S, Gyurik C, Marshall S, Briegel HJ, Dunjko V (2021) Variational quantum policies for reinforcement learning. arXiv preprint arXiv:2103.05577
Jones T, Gacon J (2020) Efficient calculation of gradients in classical simulations of variational quantum algorithms. arXiv preprint arXiv:2009.02823
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey. J Artif Intell Res 4:237–285
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: A survey. Int J Robot Res 32(11):1238–1274
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature. 521(7553):436–444
Lockwood O, Si M (2020) Reinforcement learning with quantum variational circuit. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol 16. pp 245–251
McClean JR, Boixo S, Smelyanskiy VN, Babbush R, Neven H (2018) Barren plateaus in quantum neural network training landscapes. Nat Commun 9(1):1–6
Mele AA, Mbeng GB, Santoro GE, Collura M, Torta P (2022) Avoiding barren plateaus via transferability of smooth solutions in a hamiltonian variational ansatz. Phys Rev A 106(6):060401
Meyer N, Ufrecht C, Periyasamy M, Scherer DD, Plinge A, Mutschler C (2022) A survey on quantum reinforcement learning. arXiv preprint arXiv:2211.03464
Mitarai K, Negoro M, Kitagawa M, Fujii K (2018) Quantum circuit learning. Phys Rev A 98(3):032309
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. nature. 518(7540):529–533
Ostaszewski M, Grant E, Benedetti M (2021) Structure optimization for parameterized quantum circuits. Quantum 5:391
Park C-Y, Killoran N (2024) Hamiltonian variational ansatz without barren plateaus. Quantum 8:1239
Pérez-Salinas A, Cervera-Lierta A, Gil-Fuster E, Latorre JI (2020) Data re-uploading for a universal quantum classifier. Quantum 4:226
Preskill J (2018) Quantum computing in the nisq era and beyond. Quantum 2:79
Sallab AE, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. arXiv preprint arXiv:1704.02532
Schatzki L, Larocca M, Nguyen QT, Sauvage F, Cerezo M (2024) Theoretical guarantees for permutation-equivariant quantum neural networks. npj Quantum Information. 10(1):12
Schuld M, Killoran N (2019) Quantum machine learning in feature hilbert spaces. Phys Rev Lett 122(4):040504
Schuld M, Bergholm V, Gogolin C, Izaac J, Killoran N (2019) Evaluating analytic gradients on quantum hardware. Phys Rev A 99(3):032331
Schuld M, Sweke R, Meyer JJ (2021) Effect of data encoding on the expressive power of variational quantum-machine-learning models. Phys Rev A 103(3):032430
Schuld M, Bocharov A, Svore KM, Wiebe N (2020) Circuit-centric quantum classifiers. Phys Rev A 101(3):032308
Sequeira A, Santos LP, Barbosa LS (2022) Variational quantum policy gradients with an application to quantum control. arXiv preprint arXiv:2203.10591
Shor PW (1999) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev 41(2):303–332
Silver D, Singh S, Precup D, Sutton RS (2021) Reward is enough. Artif Intell 299:103535
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. nature. 529(7587):484–489
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815
Skolik A, Jerbi S, Dunjko V (2022) Quantum agents in the gym: a variational quantum algorithm for deep q-learning. Quantum 6:720
Skolik A, Cattelan M, Yarkoni S, Bäck T, Dunjko V (2023) Equivariant quantum circuits for learning on weighted graphs. npj Quantum Inf 9(1):47
Sorzano COS, Vargas J, Montano AP (2014) A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. MIT press, ???
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292
Zhang K, Liu L, Hsieh M-H, Tao D (2022) Escaping from the barren plateau via gaussian initializations in deep variational quantum circuits. Adv Neural Inf Process Syst 35:18612–18627
Zoufal C, Lucchi A, Woerner S (2021) Variational quantum boltzmann machines. Quantum Mach Intell 3:1–15
Funding
This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project LA/P/0063/2020 (https://doi.org/10.54499/LA/P/0063/2020). RC also thanks the support of the Foundation for Science and Technology (FCT, Portugal) under grant 10053/BII-E_B4/2023. AS also thanks the support of the Foundation for Science and Technology (FCT, Portugal) within grant UI/BD/152698/2022 and project IBEX, with reference PTDC/CC1-COM/4280/2021.
Author information
Authors and Affiliations
Contributions
R.C. implemented the code and wrote the main manuscript text, under the supervision of A.S. and L.P.S. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Hyperparameters
Appendix: Hyperparameters
Some of the hyperparameters for the VQC-based Deep Q-Learning models are explained in the following table:
The set of hyperparameters used for the VQCs in Figs. 3a and b and 4a and b, can be seen in Table 3.
The set of hyperparameters used for the VQCs in Fig. 5a and b can be seen in Table 4.
The set of hyperparameters used for the VQCs in Fig. 7a and b can be seen in Table 5.
The set of hyperparameters used for the VQCs in Fig. 6a and b can be seen in Table 6.
The set of hyperparameters used for the VQCs in Figs. 10 and 11 can be seen in Table 6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Coelho, R., Sequeira, A. & Paulo Santos, L. VQC-based reinforcement learning with data re-uploading: performance and trainability. Quantum Mach. Intell. 6, 53 (2024). https://doi.org/10.1007/s42484-024-00190-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42484-024-00190-z