[2407.20454] CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models