[2406.19185] Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion