摘自:

算法中心-人机对抗智能

对抗智能体人工智能算法索引_最小化

http://turingai.ia.ac.cn/share_algorithm


1.对抗空间表示

1.1特征表示

基于卷积神经网络的地图特征提取算法
基于全连接的算子特征提取算法
Transformer
LSTM(Long Short-Term Memory)

1.2奖励函数

RND(random network distillation)
Hindsight

2.态势评估推理

意图判定:intentMARL
对手建模:DRON(Deep Reinforcement Opponent Network)
未来预测:DFP
威胁评估\透视分析

3.策略生成优化.

3.1零和博弈

Minimax Q-Learning
MCTS(Monte Carlo Tree Search)
RTNS博弈算法[1] Korf, Richard E., and David Maxwell Chickering. "Best-first minimax search." Artificial intelligence 84.1-2 (1996): 299-337.

3.2非零和博弈

FFQ(Friend-or-Foe Q-Learning)
Nash Q-Learning

3.3完美信息博弈

MCTS(Monte Carlo Tree Search
Double DQN(Double Deep Q-Networks
DDPG(Deep Deterministic Policy Gradient
AC(Actor-Critic
TRPO(Trust Region Policy Optimization
IS-MCTS(Information Set Monte Carlo Tree Search
AC(Actor-Critic

3.4非完美信息博弈

NFSP(Neural Fictitious Self-Play
CFR-BR(CFR against a best responder
临时价值评估(EVA)[1] Hansen, Steven, et al. "Fast deep reinforcement learning using online adjustments from the past." Advances in Neural Information Processing Systems. 2018.
External sampling MCCFR(External sampling Monte Carlo CFR
Best Response
CFR:反事实后悔最小化算法(Counterfactual Regret Minimization)
Outcome sampling Monte Carlo CFR 反事实后悔最小化(CFR)
Deep CFR:Deep CFR
RPG:Regret Policy Gradient
TRPO(Trust Region Policy Optimization

3.5人物博弈

PPO(Proximal Policy Optimization
DQN
Double DQN
Dueling DQN
SARSA(State-Action-Reward-State-Action
Sarsa (Lambda)
DDPG(Deep Deterministic Policy Gradient
AC
TRPO:Trust Region Policy Optimization
PER-DQN:Prioritized Experience Replay
Local GAC:Local Generative Actor-Critic

3.6两人博弈

Minimax Q-Learning
DQN
Neural Fictitious Self-Play
MCTS
DDPG(Deep Deterministic Policy Gradient
AC
TRPO(Trust Region Policy Optimization
Dueling DQN
SARSA:State-Action-Reward-State-Action
Sarsa (Lambda)
PER-DQN
Local GAC
RTNS

3.7多人博弈

MADDPG(Multi-agent actor-critic for mixed cooperative-competitive environments
MFMARL:Mean Field Multi-Agent Reinforcement Learning
NFSP:Neural Fictitious Self-Play
DRON(Deep Reinforcement Opponent Network
Nash Q:Nash Q-Learning
CFR:反事实后悔最小化算法(Counterfactual Regret Minimization)
RPG:Regret Policy Gradient

4.行动协同控制

4.1路径搜索

A*(中文通常读作A星,英文读作A-Star)
D*(Dynamic A*)

4.2多智能体协同

VDN:Value-Decomposition Networks
CoPPO:[1] Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, A. Bayen, and Yi Wu. The surprising effectiveness 363 of mappo in cooperative, multi-agent games. ArXiv, abs/2103.01955, 2021.
MARL:[1] Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson: “QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning”, 2018; arXiv:1803.11485.