[2403.14155] Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization