[2410.12784] JudgeBench: A Benchmark for Evaluating LLM-based Judges