[2408.03459] On the Generalization of Preference Learning with DPO