VQC-based reinforcement learning with data re-uploading: performance and trainability | Quantum Machine Intelligence Skip to main content

Advertisement

Log in

VQC-based reinforcement learning with data re-uploading: performance and trainability

  • Research Article
  • Published:
Quantum Machine Intelligence Aims and scope Submit manuscript

Abstract

Reinforcement learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, has been shown to achieve super-human performance in game-related tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the model’s gradients remain substantial throughout training even as the number of qubits increases. In fact, both increase considerably in the training’s early stages, when the agent needs to learn the most. They decrease later in the training, when the agent should have done most of the learning and started converging to a policy. Thus, even if the probability of being initialized in a Barren Plateau increases exponentially with system size for Hardware-Efficient ansatzes, these results indicate that the VQC-based Deep Q-Learning models may still be able to find large gradients throughout training, allowing for learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

All data necessary to replicate the results of this manuscript can be found in https://github.com/RodrigoCoelho7/VQC_Qlearning

References

  • Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: Aaai/iaai, pp 119–125

  • Anschuetz ER, Kiani BT (2022) Quantum variational algorithms are swamped with traps. Nature Commun 13(1):7760

  • Bilkis M, Cerezo M, Verdon G, Coles PJ, Cincio L (2021) A semi-agnostic ansatz with variable structure for quantum machine learning. arXiv preprint arXiv:2103.06712

  • Busoniu L, Babuska R, De Schutter B, Ernst D (2017) Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC press, ???

  • Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L et al (2021) Variational quantum algorithms. Nature Reviews. Physics 3(9):625–644

    Google Scholar 

  • Cerezo M, Larocca M, García-Martín D, Diaz N, Braccia P, Fontana E, Rudolph MS, Bermejo P, Ijaz A, Thanasilp S et al (2023) Does provable absence of barren plateaus imply classical simulability? or, why we need to rethink variational quantum computing. arXiv preprint arXiv:2312.09121

  • Cerezo M, Sone A, Volkoff T, Cincio L, Coles PJ (2021) Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nature commun 12(1):1791

  • Chen SY-C, Yang C-HH, Qi J, Chen P-Y, Ma X, Goan H-S (2020) Variational quantum circuits for deep reinforcement learning. IEEE Access. 8:141007–141024

    Article  Google Scholar 

  • Coyle B, Mills D, Danos V, Kashefi E (2020) The born supremacy: quantum advantage and training of an ising born machine. npj Quantum Information. 6(1):60

  • Drudis M, Thanasilp S, Holmes Z et al (2024) Variational quantum simulation: a case study for understanding warm starts. arXiv preprint arXiv:2404.10044

  • Farhi E, Neven H (2018) Classification with quantum neural networks on near term processors. arXiv preprint arXiv:1802.06002

  • Grant E, Wossnig L, Ostaszewski M, Benedetti M (2019) An initialization strategy for addressing barren plateaus in parametrized quantum circuits. Quantum 3:214

    Article  Google Scholar 

  • Hambly B, Xu R, Yang H (2023) Recent advances in reinforcement learning in finance. Math Financ 33(3):437–503

    Article  MathSciNet  Google Scholar 

  • Holmes Z, Sharma K, Cerezo M, Coles PJ (2022) Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quantum. 3(1):010313

    Article  Google Scholar 

  • Jerbi S, Gyurik C, Marshall S, Briegel HJ, Dunjko V (2021) Variational quantum policies for reinforcement learning. arXiv preprint arXiv:2103.05577

  • Jones T, Gacon J (2020) Efficient calculation of gradients in classical simulations of variational quantum algorithms. arXiv preprint arXiv:2009.02823

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey. J Artif Intell Res 4:237–285

  • Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: A survey. Int J Robot Res 32(11):1238–1274

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature. 521(7553):436–444

    Google Scholar 

  • Lockwood O, Si M (2020) Reinforcement learning with quantum variational circuit. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol 16. pp 245–251

  • McClean JR, Boixo S, Smelyanskiy VN, Babbush R, Neven H (2018) Barren plateaus in quantum neural network training landscapes. Nat Commun 9(1):1–6

    Article  Google Scholar 

  • Mele AA, Mbeng GB, Santoro GE, Collura M, Torta P (2022) Avoiding barren plateaus via transferability of smooth solutions in a hamiltonian variational ansatz. Phys Rev A 106(6):060401

    Article  Google Scholar 

  • Meyer N, Ufrecht C, Periyasamy M, Scherer DD, Plinge A, Mutschler C (2022) A survey on quantum reinforcement learning. arXiv preprint arXiv:2211.03464

  • Mitarai K, Negoro M, Kitagawa M, Fujii K (2018) Quantum circuit learning. Phys Rev A 98(3):032309

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. nature. 518(7540):529–533

  • Ostaszewski M, Grant E, Benedetti M (2021) Structure optimization for parameterized quantum circuits. Quantum 5:391

    Article  Google Scholar 

  • Park C-Y, Killoran N (2024) Hamiltonian variational ansatz without barren plateaus. Quantum 8:1239

    Article  Google Scholar 

  • Pérez-Salinas A, Cervera-Lierta A, Gil-Fuster E, Latorre JI (2020) Data re-uploading for a universal quantum classifier. Quantum 4:226

    Article  Google Scholar 

  • Preskill J (2018) Quantum computing in the nisq era and beyond. Quantum 2:79

    Article  Google Scholar 

  • Sallab AE, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. arXiv preprint arXiv:1704.02532

  • Schatzki L, Larocca M, Nguyen QT, Sauvage F, Cerezo M (2024) Theoretical guarantees for permutation-equivariant quantum neural networks. npj Quantum Information. 10(1):12

  • Schuld M, Killoran N (2019) Quantum machine learning in feature hilbert spaces. Phys Rev Lett 122(4):040504

    Article  Google Scholar 

  • Schuld M, Bergholm V, Gogolin C, Izaac J, Killoran N (2019) Evaluating analytic gradients on quantum hardware. Phys Rev A 99(3):032331

    Article  Google Scholar 

  • Schuld M, Sweke R, Meyer JJ (2021) Effect of data encoding on the expressive power of variational quantum-machine-learning models. Phys Rev A 103(3):032430

  • Schuld M, Bocharov A, Svore KM, Wiebe N (2020) Circuit-centric quantum classifiers. Phys Rev A 101(3):032308

  • Sequeira A, Santos LP, Barbosa LS (2022) Variational quantum policy gradients with an application to quantum control. arXiv preprint arXiv:2203.10591

  • Shor PW (1999) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev 41(2):303–332

    Article  MathSciNet  Google Scholar 

  • Silver D, Singh S, Precup D, Sutton RS (2021) Reward is enough. Artif Intell 299:103535

    Article  MathSciNet  Google Scholar 

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. nature. 529(7587):484–489

  • Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815

  • Skolik A, Jerbi S, Dunjko V (2022) Quantum agents in the gym: a variational quantum algorithm for deep q-learning. Quantum 6:720

    Article  Google Scholar 

  • Skolik A, Cattelan M, Yarkoni S, Bäck T, Dunjko V (2023) Equivariant quantum circuits for learning on weighted graphs. npj Quantum Inf 9(1):47

  • Sorzano COS, Vargas J, Montano AP (2014) A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877

  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. MIT press, ???

  • Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292

  • Zhang K, Liu L, Hsieh M-H, Tao D (2022) Escaping from the barren plateau via gaussian initializations in deep variational quantum circuits. Adv Neural Inf Process Syst 35:18612–18627

    Google Scholar 

  • Zoufal C, Lucchi A, Woerner S (2021) Variational quantum boltzmann machines. Quantum Mach Intell 3:1–15

Download references

Funding

This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project LA/P/0063/2020 (https://doi.org/10.54499/LA/P/0063/2020). RC also thanks the support of the Foundation for Science and Technology (FCT, Portugal) under grant 10053/BII-E_B4/2023. AS also thanks the support of the Foundation for Science and Technology (FCT, Portugal) within grant UI/BD/152698/2022 and project IBEX, with reference PTDC/CC1-COM/4280/2021.

Author information

Authors and Affiliations

Authors

Contributions

R.C. implemented the code and wrote the main manuscript text, under the supervision of A.S. and L.P.S. All authors reviewed the manuscript.

Corresponding author

Correspondence to Rodrigo Coelho.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Hyperparameters

Appendix: Hyperparameters

Some of the hyperparameters for the VQC-based Deep Q-Learning models are explained in the following table:

The set of hyperparameters used for the VQCs in Figs. 3a and b and 4a and b, can be seen in Table 3.

Table 2 An explanation of VQC-Based Deep Q-Learning’s hyperparameters
Table 3 Models’ hyperparameters of Section 4.3

The set of hyperparameters used for the VQCs in Fig. 5a and b can be seen in Table 4.

The set of hyperparameters used for the VQCs in Fig. 7a and b can be seen in Table 5.

The set of hyperparameters used for the VQCs in Fig. 6a and b can be seen in Table 6.

The set of hyperparameters used for the VQCs in Figs. 10 and 11 can be seen in Table 6.

Table 4 Models’ hyperparameters of Section 4.5
Table 5 Models’ hyperparameters of Section 4.6
Table 6 Models’ hyperparameters of Section 4.7
Table 7 Models’ hyperparameters of Section 4.7 — Fig. 8
Table 8 Models’ hyperparameters of Section 4.8

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coelho, R., Sequeira, A. & Paulo Santos, L. VQC-based reinforcement learning with data re-uploading: performance and trainability. Quantum Mach. Intell. 6, 53 (2024). https://doi.org/10.1007/s42484-024-00190-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42484-024-00190-z

Keywords

Navigation