default search action
Yunhao Tang
Person information
SPARQL queries
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j1]Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney:
An Analysis of Quantile Temporal-Difference Learning. J. Mach. Learn. Res. 25: 163:1-163:47 (2024) - [c40]Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh:
Learning Uncertainty-Aware Temporally-Extended Actions. AAAI 2024: 13391-13399 - [c39]Daniele Calandriello, Zhaohan Daniel Guo, Rémi Munos, Mark Rowland, Yunhao Tang, Bernardo Ávila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot:
Human Alignment of Large Language Models through Online Preference Optimisation. ICML 2024 - [c38]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. ICML 2024 - [c37]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. ICML 2024 - [c36]Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland:
A Distributional Analogue to the Successor Representation. ICML 2024 - [i52]Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh:
Learning Uncertainty-Aware Temporally-Extended Actions. CoRR abs/2402.05439 (2024) - [i51]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. CoRR abs/2402.05749 (2024) - [i50]Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney:
Off-policy Distributional Q(λ): Distributional RL without Importance Sampling. CoRR abs/2402.05766 (2024) - [i49]Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney:
Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model. CoRR abs/2402.07598 (2024) - [i48]Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland:
A Distributional Analogue to the Successor Representation. CoRR abs/2402.08530 (2024) - [i47]Daniele Calandriello, Daniel Guo, Rémi Munos, Mark Rowland, Yunhao Tang, Bernardo Ávila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot:
Human Alignment of Large Language Models through Online Preference Optimisation. CoRR abs/2403.08635 (2024) - [i46]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney:
Understanding the performance gap between online and offline alignment algorithms. CoRR abs/2405.08448 (2024) - [i45]Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Ávila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Rémi Munos, Bilal Piot:
Offline Regularised Reinforcement Learning for Large Language Models Alignment. CoRR abs/2405.19107 (2024) - [i44]Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Ávila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney:
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning. CoRR abs/2406.02035 (2024) - [i43]Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah:
On scalable oversight with weak LLMs judging strong LLMs. CoRR abs/2407.04622 (2024) - 2023
- [c35]Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Rémi Munos, Will Dabney, Diana L. Borsa:
Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition. ICML 2023: 4009-4034 - [c34]Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo:
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. ICML 2023: 17135-17175 - [c33]Thomas Mesnard, Wenqi Chen, Alaa Saade, Yunhao Tang, Mark Rowland, Theophane Weber, Clare Lyle, Audrunas Gruslys, Michal Valko, Will Dabney, Georg Ostrovski, Eric Moulines, Rémi Munos:
Quantile Credit Assignment. ICML 2023: 24517-24531 - [c32]Pierre Harvey Richemond, Allison C. Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill:
The Edge of Orthogonality: A Simple View of What Makes BYOL Tick. ICML 2023: 29063-29081 - [c31]Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney:
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation. ICML 2023: 29210-29231 - [c30]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. ICML 2023: 33632-33656 - [c29]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko:
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm. ICML 2023: 33657-33673 - [c28]Yunhao Tang, Rémi Munos:
Towards a better understanding of representation dynamics under TD-learning. ICML 2023: 33720-33738 - [c27]Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko:
VA-learning as a more efficient alternative to Q-learning. ICML 2023: 33739-33757 - [c26]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Ménard:
Fast Rates for Maximum Entropy Exploration. ICML 2023: 34161-34221 - [i42]Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney:
An Analysis of Quantile Temporal-Difference Learning. CoRR abs/2301.04462 (2023) - [i41]Pierre H. Richemond, Allison C. Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill:
The Edge of Orthogonality: A Simple View of What Makes BYOL Tick. CoRR abs/2302.04817 (2023) - [i40]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Ménard:
Fast Rates for Maximum Entropy Exploration. CoRR abs/2303.08059 (2023) - [i39]Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Rémi Munos, Will Dabney, Diana L. Borsa:
Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition. CoRR abs/2305.00654 (2023) - [i38]Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo:
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. CoRR abs/2305.13185 (2023) - [i37]Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko:
VA-learning as a more efficient alternative to Q-learning. CoRR abs/2305.18161 (2023) - [i36]Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney:
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation. CoRR abs/2305.18388 (2023) - [i35]Yunhao Tang, Rémi Munos:
Towards a Better Understanding of Representation Dynamics under TD-learning. CoRR abs/2305.18491 (2023) - [i34]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko:
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm. CoRR abs/2305.18501 (2023) - [i33]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. CoRR abs/2312.00886 (2023) - 2022
- [c25]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Marginalized Operators for Off-policy Reinforcement Learning. AISTATS 2022: 655-679 - [c24]Yunhao Tang:
Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning. ICML 2022: 21050-21075 - [c23]Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Ménard:
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses. ICML 2022: 21380-21431 - [c22]Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. NeurIPS 2022 - [c21]Yunhao Tang, Rémi Munos, Mark Rowland, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare:
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning. NeurIPS 2022 - [i32]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Marginalized Operators for Off-policy Reinforcement Learning. CoRR abs/2203.16177 (2022) - [i31]Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Ménard:
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses. CoRR abs/2205.07704 (2022) - [i30]Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári:
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal. CoRR abs/2205.14211 (2022) - [i29]Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. CoRR abs/2206.08332 (2022) - [i28]Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare:
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning. CoRR abs/2207.07570 (2022) - [i27]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. CoRR abs/2212.03319 (2022) - 2021
- [b1]Yunhao Tang:
Reinforcement Learning: New Algorithms and An Application for Integer Programming. Columbia University, USA, 2021 - [c20]Yunhao Tang, Alp Kucukelbir:
Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning. AISTATS 2021: 2863-2871 - [c19]Yunhao Tang:
Guiding Evolutionary Strategies with Off-Policy Actor-Critic. AAMAS 2021: 1317-1325 - [c18]Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel:
Revisiting Peng's Q(λ) for Modern Reinforcement Learning. ICML 2021: 5794-5804 - [c17]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Taylor Expansion of Discount Factors. ICML 2021: 10130-10140 - [c16]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko:
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation. NeurIPS 2021: 5303-5315 - [i26]Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamás Sarlós, Yuxiang Yang:
ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning. CoRR abs/2101.07415 (2021) - [i25]Krzysztof Choromanski, Deepali Jain, Jack Parker-Holder, Xingyou Song, Valerii Likhosherstov, Anirban Santara, Aldo Pacchiano, Yunhao Tang, Adrian Weller:
Unlocking Pixels for Reinforcement Learning via Implicit Attention. CoRR abs/2102.04353 (2021) - [i24]Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel:
Revisiting Peng's Q(λ) for Modern Reinforcement Learning. CoRR abs/2103.00107 (2021) - [i23]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Taylor Expansion of Discount Factors. CoRR abs/2106.06170 (2021) - [i22]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko:
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation. CoRR abs/2106.13125 (2021) - [i21]Yunhao Tang:
Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning. CoRR abs/2112.07328 (2021) - 2020
- [c15]Yunhao Tang, Shipra Agrawal:
Discretizing Continuous Action Space for On-Policy Optimization. AAAI 2020: 5981-5988 - [c14]Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir:
Variance Reduction for Evolution Strategies via Structured Control Variates. AISTATS 2020: 646-656 - [c13]Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang:
Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes. AISTATS 2020: 1363-1374 - [c12]Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou:
Discrete Action On-Policy Learning with Action-Value Critic. AISTATS 2020: 1977-1987 - [c11]Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang:
ES-MAML: Simple Hessian-Free Meta Learning. ICLR 2020 - [c10]Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos:
Monte-Carlo Tree Search as Regularized Policy Optimization. ICML 2020: 3769-3778 - [c9]Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Krzysztof Choromanski, Anna Choromanska, Michael I. Jordan:
Learning to Score Behaviors for Guided Policy Optimization. ICML 2020: 7445-7454 - [c8]Yunhao Tang, Shipra Agrawal, Yuri Faenza:
Reinforcement Learning for Integer Programming: Learning to Cut. ICML 2020: 9367-9376 - [c7]Yunhao Tang, Michal Valko, Rémi Munos:
Taylor Expansion Policy Optimization. ICML 2020: 9397-9406 - [c6]Yunhao Tang:
Self-Imitation Learning via Generalized Lower Bound Q-learning. NeurIPS 2020 - [i20]Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Yin:
Discrete Action On-Policy Learning with Action-Value Critic. CoRR abs/2002.03534 (2020) - [i19]Yunhao Tang, Michal Valko, Rémi Munos:
Taylor Expansion Policy Optimization. CoRR abs/2003.06259 (2020) - [i18]Yunhao Tang:
Self-Imitation Learning via Generalized Lower Bound Q-learning. CoRR abs/2006.07442 (2020) - [i17]Yunhao Tang, Alp Kucukelbir:
Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning. CoRR abs/2006.07549 (2020) - [i16]Yunhao Tang, Krzysztof Choromanski:
Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies. CoRR abs/2006.07554 (2020) - [i15]Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos:
Monte-Carlo Tree Search as Regularized Policy Optimization. CoRR abs/2007.12509 (2020)
2010 – 2019
- 2019
- [c5]Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamás Sarlós, Adrian Weller:
Orthogonal Estimation of Wasserstein Distances. AISTATS 2019: 186-195 - [c4]Krzysztof Choromanski, Aldo Pacchiano, Jeffrey Pennington, Yunhao Tang:
KAMA-NNs: Low-dimensional Rotation Based Neural Networks. AISTATS 2019: 236-245 - [c3]Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani:
Provably Robust Blackbox Optimization for Reinforcement Learning. CoRL 2019: 683-696 - [c2]Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Vikas Sindhwani:
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization. NeurIPS 2019: 10299-10309 - [i14]Yunhao Tang, Shipra Agrawal:
Discretizing Continuous Action Space for On-Policy Optimization. CoRR abs/1901.10500 (2019) - [i13]Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamás Sarlós, Adrian Weller:
Orthogonal Estimation of Wasserstein Distances. CoRR abs/1903.03784 (2019) - [i12]Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang:
Adaptive Sample-Efficient Blackbox Optimization via ES-active Subspaces. CoRR abs/1903.04268 (2019) - [i11]Yunhao Tang, Mingzhang Yin, Mingyuan Zhou:
Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy. CoRR abs/1903.05284 (2019) - [i10]Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang:
Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes. CoRR abs/1905.12667 (2019) - [i9]Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan:
Wasserstein Reinforcement Learning. CoRR abs/1906.04349 (2019) - [i8]Yunhao Tang, Shipra Agrawal, Yuri Faenza:
Reinforcement Learning for Integer Programming: Learning to Cut. CoRR abs/1906.04859 (2019) - [i7]Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir:
Variance Reduction for Evolution Strategies via Structured Control Variates. CoRR abs/1906.08868 (2019) - [i6]Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamás Sarlós, Deepali Jain, Yuxiang Yang:
Reinforcement Learning with Chromatic Networks. CoRR abs/1907.06511 (2019) - [i5]Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang:
ES-MAML: Simple Hessian-Free Meta Learning. CoRR abs/1910.01215 (2019) - 2018
- [c1]Yunhao Tang, Shipra Agrawal:
Exploration by Distributional Reinforcement Learning. IJCAI 2018: 2710-2716 - [i4]Yunhao Tang, Shipra Agrawal:
Exploration by Distributional Reinforcement Learning. CoRR abs/1805.01907 (2018) - [i3]Yunhao Tang, Shipra Agrawal:
Implicit Policy for Reinforcement Learning. CoRR abs/1806.06798 (2018) - [i2]Yunhao Tang, Shipra Agrawal:
Boosting Trust Region Policy Optimization by Normalizing Flows Policy. CoRR abs/1809.10326 (2018) - 2017
- [i1]Yunhao Tang, Alp Kucukelbir:
Variational Deep Q Network. CoRR abs/1711.11225 (2017)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-01-09 12:47 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint