[2403.14551] Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling