Resilient-Constrained-Reinforcement-Learning

Link to the paper: https://arxiv.org/abs/2312.17194

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

====================================================================

Acknowledgment

The code of generating random MDP transition dynamics

https://github.com/jb3618columbia/Optimality-Guarantees-for-Policy-Gradient-Methods

The reference: Global Optimality Guarantees For Policy Gradient Methods, J. Bhandari and D. Russo

====================================================================

Description

Our proposed methods in the manuscript can be referred as follows:

(1) ResPG-PD (Resilient Policy Gradient Primal-Dual) -------------------> Algorithm 1

(2) ResOPG-PD (Resilient Optimistic Policy Gradient Primal-Dual) --------> Algorithm 2

All of our computational results in the manuscript can be reproduced by, executing our Jupyter Notebook documents, accordingly. The requirements are:

Jupyter Notebook

Python 3, NumPy, Matplotlib, Cvxopt, and SciPy.

====================================================================

Convergence performance of ResPG-PD and ResOPG-PD in policy search problem (Section D.1)

(1) Resilient_RL_PolicySearch.ipynb(sec:Plot reward, utility and relaxation vs iterations) ----------> Figure 3, Figure 11 and Figure 12

====================================================================

Optimality gap of RPG-PD, OPG-PD, and dual methods in policy search problem (Section D.1)

(1) Resilient_RL_PolicySearch.ipynb(sec:Plot reward, utility, relaxation, policy optimality gaps) ---> Figure 2 and Figures 13-15

====================================================================

Convergence performance of ResPG-PD and ResOPG-PD in policy search problem with different relaxation costs (Section D.1)

(1) Resilient_RL_PolicySearch.ipynb(sec:Reward, utility, relaxation vs alpha) -----------------------> Figure 4

====================================================================

Convergence performance of ResPG-PD and ResOPG-PD in monitoring problem (small state space, Section D.2)

(1) Resilient_RL_Monitoring_D2.ipynb(sec:Plot reward, utility and relaxation vs iterations) ------------> Figure 6, Figure 16 and Figure 17

====================================================================

Optimality gap of RPG-PD, OPG-PD, and dual methods in monitoring problem (small state space, Section D.2)

(1) Resilient_RL_Monitoring_D2.ipynb(sec:Plot reward, utility, relaxation optimality gaps) -------------> Figure 18-20

====================================================================

Convergence performance of ResPG-PD and ResOPG-PD in monitoring problem (small state space, Section D.2) with different relaxation costs

(1) Resilient_RL_Monitoring_D2.ipynb(sec:Reward, relaxation vs alpha) ----------------------------------> Figure 7

====================================================================

Convergence performance of ResPG-PD and ResOPG-PD in monitoring problem (large state space, Section D.3)

(1) Resilient_RL_Monitoring_D3.ipynb(sec:Plot reward, utility and relaxation vs iterations) ------------> Figure 9, Figure 21 and Figure 22

====================================================================

Optimality gap of RPG-PD, OPG-PD, and dual methods in monitoring problem (large state space, Section D.3)

(1) Resilient_RL_Monitoring_D3.ipynb(sec:Plot reward, utility, relaxation optimality gaps) -------------> Figure 23-25

====================================================================

Convergence performance of ResPG-PD and ResOPG-PD in monitoring problem (large state space, Section D.2) with different relaxation costs

(1) Resilient_RL_Monitoring_D3.ipynb(sec:Reward, relaxation vs alpha) ----------------------------------> Figure 10

====================================================================

Policy of ResPG-PD and ResOPG-PD in monitoring problem (large state space, Section D.2)

(1) Resilient_RL_Monitoring_D3.ipynb(sec:Visualize the policy) ----------------------------------> Figure 8

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Resilient_RL_Monitoring_D2.ipynb		Resilient_RL_Monitoring_D2.ipynb
Resilient_RL_Monitoring_D3.ipynb		Resilient_RL_Monitoring_D3.ipynb
Resilient_RL_PolicySearch_D1.ipynb		Resilient_RL_PolicySearch_D1.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resilient-Constrained-Reinforcement-Learning

Description

About

Releases

Packages

Contributors 2

Languages

ZhengyanHuan/Resilient-Constrained-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Resilient-Constrained-Reinforcement-Learning

Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages