Misusing Tools in Large Language Models With Visual Adversarial Examples

Fu, Xiaohan; Wang, Zihan; Li, Shuheng; Gupta, Rajesh K.; Mireshghallah, Niloofar; Berg-Kirkpatrick, Taylor; Fernandes, Earlence

Computer Science > Cryptography and Security

arXiv:2310.03185 (cs)

[Submitted on 4 Oct 2023]

Title:Misusing Tools in Large Language Models With Visual Adversarial Examples

Authors:Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes

View PDF

Abstract:Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversations and book hotels. Different from prior work, our attacks can affect the confidentiality and integrity of user resources connected to the LLM while being stealthy and generalizable to multiple input prompts. We construct these attacks using gradient-based adversarial training and characterize performance along multiple dimensions. We find that our adversarial images can manipulate the LLM to invoke tools following real-world syntax almost always (~98%) while maintaining high similarity to clean images (~0.9 SSIM). Furthermore, using human scoring and automated metrics, we find that the attacks do not noticeably affect the conversation (and its semantics) between the user and the LLM.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2310.03185 [cs.CR]
	(or arXiv:2310.03185v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2310.03185

Submission history

From: Xiaohan Fu [view email]
[v1] Wed, 4 Oct 2023 22:10:01 UTC (1,569 KB)

Computer Science > Cryptography and Security

Title:Misusing Tools in Large Language Models With Visual Adversarial Examples

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Misusing Tools in Large Language Models With Visual Adversarial Examples

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators