[2406.01660] Self-Improving Robust Preference Optimization