[2312.17115] How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation