[2408.13461] Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach