[2411.00585] Benchmarking Bias in Large Language Models during Role-Playing