[2311.16081] ViT-Lens: Towards Omni-modal Representations