[2302.11713] Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?