[2405.20850] Improving Reward Models with Synthetic Critiques