[2402.12195] Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion