[2410.00526] Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents