[2401.15847] Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA