[2207.03482] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection