[2406.15612] Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients