[2406.04673] MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models