1. Model-Free

Value-based

 

Policy-based

 

Actor-Critic

 

General Agents

 

Imitation Learning Agents

 

Hierarchical Reinforcement Learning Agents

 

Memory Types

 

Exploration Techniques

 

2. Model-Based

  • DYNA-Q
  • Dataset Aggregation (Dagger)
  • Monte Carlo Tree Search (MCTS) (eg. AlphaZero)
  • Dynamic Programming
  • Model Predictive Control
  • Probabilistic Inference for Learning COntrol (PILCO)
  • Guided Policy Search (GPS)
  • Policy search with Gaussian Process
  • Policy search with backpropagation

 

Summary

Algorithm

Model-free or model-based

Agent type

Policy

Policy type

Monte Carlo or Temporal difference (TD)

Action space

State space

Tabular Q-learning (= SARSA max)

Q learning lambda

Model free

Value-based

Off-policy

Pseudo-deterministic (epsilon greedy)

TD

Discrete

Discrete

SARSA

SARSA lambda

Model free

Value-based

On-policy

Pseudo-deterministic (epsilon greedy)

TD

Discrete

Discrete

DQN

N step DQN

Double DQN

Noisy DQN

Prioritized Replay DQN

Dueling DQN

Catergorical DQN

Distributed DQN (C51)

Model free

Value-based

Off-policy

Pseudo-deterministic (epsilon greedy)

 

Discrete

Continuous

Cross-entropy method

Model free

Policy-based

On-policy

 

Monte Carlo

 

 

REINFORCE (Vanilla policy gradient)

Model free

Policy-based

On-policy

Stochastic policy

Monte Carlo

 

 

Policy gradient softmax

Model free

 

 

Stochastic policy

 

 

 

Natural Policy Gradient

Model free

 

 

Stochastic policy

 

 

 

TRPO

Model free

Policy-based

On-policy (?)

Stochastic policy

 

Continuous

Continuous

PPO

Model free

Policy-based

On-policy (?)

Stochastic policy

 

Continuous

Continuous

Distributed PPO

Model free

Policy-based

 

 

 

Continuous

Continuous

A2C

Model free

Actor-critic

On-policy

Stochastic policy

TD

Continuous

 

A3C

 

Actor-critic

On-policy

 

 

 

 

DDPG (A2C family)

Model free

Actor-critic

Off-policy

Deterministic policy

 

Continuous

Continuous

TD3

Model free

Actor-critic

 

 

 

Continuous

Continuous

D4PG

 

 

 

 

 

 

 

SAC

Model free

Actor-critic

Off-policy

 

 

 

 

Dyna-Q

 

 

 

 

 

 

 

Curiosity Model

 

 

 

 

 

 

 

NAF

Model free

 

 

 

 

Continuous

 

DAgger

 

 

 

 

 

 

 

MCTS

 

 

 

 

 

 

 

Dynamic programming

 

 

 

 

 

 

 

GPS

 

 

 

 

 

 

 

Model Predictive Control

Model-based

 

 

 

 

 

 

PILCO

Model-based

 

 

 

 

 

 

Policy search with Gaussian Process

Model-based

 

 

 

 

 

 

Policy search with backpropagation

Model-based

 

 

 

 

 

 

 

Conclusion

We have just seen some of the most used RL algorithms. In the next article, we will look at the challenges and application of RL for robotic applications.