[2408.10943] SysBench: Can Large Language Models Follow System Messages?