[2403.02905] MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model