[2410.11287] Process Reward Model with Q-Value Rankings