Jailbreaking LLM-Controlled Robots
Surprising no one, it’s easy to trick an LLM-controlled robot into ignoring its safety instructions.
Page 1 of 5
Surprising no one, it’s easy to trick an LLM-controlled robot into ignoring its safety instructions.
This essay was written with Nathan E. Sanders. It originally appeared as a response to Evgeny Morozov in Boston Review‘s forum, “The AI We Deserve.”
For a technology that seems startling in its modernity, AI sure has a long history. Google Translate, OpenAI chatbots, and Meta AI image generators are built on decades of advancements in linguistics, signal processing, statistics, and other fields going back to the early days of computing—and, often, on seed funding from the U.S. Department of Defense. But today’s tools are hardly the intentional product of the diverse generations of innovators that came before. We agree with Morozov that the “refuseniks,” as he calls them, are wrong to see AI as “irreparably tainted” by its origins. AI is better understood as a creative, global field of human endeavor that has been largely captured by U.S. venture capitalists, private equity, and Big Tech. But that was never the inevitable outcome, and it doesn’t need to stay that way.
The internet is a case in point. The fact that it originated in the military is a historical curiosity, not an indication of its essential capabilities or social significance. Yes, it was created to connect different, incompatible Department of Defense networks. Yes, it was designed to survive the sorts of physical damage expected from a nuclear war. And yes, back then it was a bureaucratically controlled space where frivolity was discouraged and commerce was forbidden.
Over the decades, the internet transformed from military project to academic tool to the corporate marketplace it is today. These forces, each in turn, shaped what the internet was and what it could do. For most of us billions online today, the only internet we have ever known has been corporate—because the internet didn’t flourish until the capitalists got hold of it.
AI followed a similar path. It was originally funded by the military, with the military’s goals in mind. But the Department of Defense didn’t design the modern ecosystem of AI any more than it did the modern internet. Arguably, its influence on AI was even less because AI simply didn’t work back then. While the internet exploded in usage, AI hit a series of dead ends. The research discipline went through multiple “winters” when funders of all kinds—military and corporate—were disillusioned and research money dried up for years at a time. Since the release of ChatGPT, AI has reached the same endpoint as the internet: it is thoroughly dominated by corporate power. Modern AI, with its deep reinforcement learning and large language models, is shaped by venture capitalists, not the military—nor even by idealistic academics anymore.
We agree with much of Morozov’s critique of corporate control, but it does not follow that we must reject the value of instrumental reason. Solving problems and pursuing goals is not a bad thing, and there is real cause to be excited about the uses of current AI. Morozov illustrates this from his own experience: he uses AI to pursue the explicit goal of language learning.
AI tools promise to increase our individual power, amplifying our capabilities and endowing us with skills, knowledge, and abilities we would not otherwise have. This is a peculiar form of assistive technology, kind of like our own personal minion. It might not be that smart or competent, and occasionally it might do something wrong or unwanted, but it will attempt to follow your every command and gives you more capability than you would have had without it.
Of course, for our AI minions to be valuable, they need to be good at their tasks. On this, at least, the corporate models have done pretty well. They have many flaws, but they are improving markedly on a timescale of mere months. ChatGPT’s initial November 2022 model, GPT-3.5, scored about 30 percent on a multiple-choice scientific reasoning benchmark called GPQA. Five months later, GPT-4 scored 36 percent; by May this year, GPT-4o scored about 50 percent, and the most recently released o1 model reached 78 percent, surpassing the level of experts with PhDs. There is no one singular measure of AI performance, to be sure, but other metrics also show improvement.
That’s not enough, though. Regardless of their smarts, we would never hire a human assistant for important tasks, or use an AI, unless we can trust them. And while we have millennia of experience dealing with potentially untrustworthy humans, we have practically none dealing with untrustworthy AI assistants. This is the area where the provenance of the AI matters most. A handful of for-profit companies—OpenAI, Google, Meta, Anthropic, among others—decide how to train the most celebrated AI models, what data to use, what sorts of values they embody, whose biases they are allowed to reflect, and even what questions they are allowed to answer. And they decide these things in secret, for their benefit.
It’s worth stressing just how closed, and thus untrustworthy, the corporate AI ecosystem is. Meta has earned a lot of press for its “open-source” family of LLaMa models, but there is virtually nothing open about them. For one, the data they are trained with is undisclosed. You’re not supposed to use LLaMa to infringe on someone else’s copyright, but Meta does not want to answer questions about whether it violated copyrights to build it. You’re not supposed to use it in Europe, because Meta has declined to meet the regulatory requirements anticipated from the EU’s AI Act. And you have no say in how Meta will build its next model.
The company may be giving away the use of LLaMa, but it’s still doing so because it thinks it will benefit from your using it. CEO Mark Zuckerberg has admitted that eventually, Meta will monetize its AI in all the usual ways: charging to use it at scale, fees for premium models, advertising. The problem with corporate AI is not that the companies are charging “a hefty entrance fee” to use these tools: as Morozov rightly points out, there are real costs to anyone building and operating them. It’s that they are built and operated for the purpose of enriching their proprietors, rather than because they enrich our lives, our wellbeing, or our society.
But some emerging models from outside the world of corporate AI are truly open, and may be more trustworthy as a result. In 2022 the research collaboration BigScience developed an LLM called BLOOM with freely licensed data and code as well as public compute infrastructure. The collaboration BigCode has continued in this spirit, developing LLMs focused on programming. The government of Singapore has built SEA-LION, an open-source LLM focused on Southeast Asian languages. If we imagine a future where we use AI models to benefit all of us—to make our lives easier, to help each other, to improve our public services—we will need more of this. These may not be “eolithic” pursuits of the kind Morozov imagines, but they are worthwhile goals. These use cases require trustworthy AI models, and that means models built under conditions that are transparent and with incentives aligned to the public interest.
Perhaps corporate AI will never satisfy those goals; perhaps it will always be exploitative and extractive by design. But AI does not have to be solely a profit-generating industry. We should invest in these models as a public good, part of the basic infrastructure of the twenty-first century. Democratic governments and civil society organizations can develop AI to offer a counterbalance to corporate tools. And the technology they build, for all the flaws it may have, will enjoy a superpower that corporate AI never will: it will be accountable to the public interest and subject to public will in the transparency, openness, and trustworthiness of its development.
These are two attacks against the system components surrounding LLMs:
We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user inputs and generated model outputs can adversely affect these other components in the broader implemented system.
[…]
When confronted with a sensitive topic, Microsoft 365 Copilot and ChatGPT answer questions that their first-line guardrails are supposed to stop. After a few lines of text they halt—seemingly having “second thoughts”—before retracting the original answer (also known as Clawback), and replacing it with a new one without the offensive content, or a simple error message. We call this attack “Second Thoughts.”
[…]
After asking the LLM a question, if the user clicks the Stop button while the answer is still streaming, the LLM will not engage its second-line guardrails. As a result, the LLM will provide the user with the answer generated thus far, even though it violates system policies.
In other words, pressing the Stop button halts not only the answer generation but also the guardrails sequence. If the stop button isn’t pressed, then ‘Second Thoughts’ is triggered.
What’s interesting here is that the model itself isn’t being exploited. It’s the code around the model:
By attacking the application architecture components surrounding the model, and specifically the guardrails, we manipulate or disrupt the logical chain of the system, taking these components out of sync with the intended data flow, or otherwise exploiting them, or, in turn, manipulating the interaction between these components in the logical chain of the application implementation.
In modern LLM systems, there is a lot of code between what you type and what the LLM receives, and between what the LLM produces and what you see. All of that code is exploitable, and I expect many more vulnerabilities to be discovered in the coming year.
Interesting research: “Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks“:
Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs’ susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automated cyberattack, Mantis plants carefully crafted inputs into system responses, leading the attacker’s LLM to disrupt their own operations (passive defense) or even compromise the attacker’s machine (active defense). By deploying purposefully vulnerable decoy services to attract the attacker and using dynamic prompt injections for the attacker’s LLM, Mantis can autonomously hack back the attacker. In our experiments, Mantis consistently achieved over 95% effectiveness against automated LLM-driven attacks. To foster further research and collaboration, Mantis is available as an open-source tool: this https URL.
This isn’t the solution, of course. But this sort of thing could be part of a solution.
Really interesting research: “An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection“:
Abstract: Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CODEBREAKER, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CODEBREAKER leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CODEBREAKER stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CODEBREAKER across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CODEBREAKER challenges current security measures, underscoring the critical need for more robust defenses for code completion.
Clever attack, and yet another illustration of why trusted AI is essential.
Researchers at Google have developed a watermark for LLM-generated text. The basics are pretty obvious: the LLM chooses between tokens partly based on a cryptographic key, and someone with knowledge of the key can detect those choices. What makes this hard is (1) how much text is required for the watermark to work, and (2) how robust the watermark is to post-generation editing. Google’s version looks pretty good: it’s detectable in text as small as 200 tokens.
Tax farming is the practice of licensing tax collection to private contractors. Used heavily in ancient Rome, it’s largely fallen out of practice because of the obvious conflict of interest between the state and the contractor. Because tax farmers are primarily interested in short-term revenue, they have no problem abusing taxpayers and making things worse for them in the long term. Today, the U.S. Securities and Exchange Commission (SEC) is engaged in a modern-day version of tax farming. And the potential for abuse will grow when the farmers start using artificial intelligence.
In 2009, after Bernie Madoff’s $65 billion Ponzi scheme was exposed, Congress authorized the SEC to award bounties from civil penalties recovered from securities law violators. It worked in a big way. In 2012, when the program started, the agency received more than 3,000 tips. By 2020, it had more than doubled, and it more than doubled again by 2023. The SEC now receives more than 50 tips per day, and the program has paid out a staggering $2 billion in bounty awards. According to the agency’s 2023 financial report, the SEC paid out nearly $600 million to whistleblowers last year.
The appeal of the whistleblower program is that it alerts the SEC to violations it may not otherwise uncover, without any additional staff. And since payouts are a percentage of fines collected, it costs the government little to implement.
Unfortunately, the program has resulted in a new industry of private de facto regulatory enforcers. Legal scholar Alexander Platt has shown how the SEC’s whistleblower program has effectively privatized a huge portion of financial regulatory enforcement. There is a role for publicly sourced information in securities regulatory enforcement, just as there has been in litigation for antitrust and other areas of the law. But the SEC program, and a similar one at the U.S. Commodity Futures Trading Commission, has created a market distortion replete with perverse incentives. Like the tax farmers of history, the interests of the whistleblowers don’t match those of the government.
First, while the blockbuster awards paid out to whistleblowers draw attention to the SEC’s successes, they obscure the fact that its staffing level has slightly declined during a period of tremendous market growth. In one case, the SEC’s largest ever, it paid $279 million to an individual whistleblower. That single award was nearly one-third of the funding of the SEC’s entire enforcement division last year. Congress gets to pat itself on the back for spinning up a program that pays for itself (by law, the SEC awards 10 to 30 percent of their penalty collections over $1 million to qualifying whistleblowers), when it should be talking about whether or not it’s given the agency enough resources to fulfill its mission to “maintain fair, orderly, and efficient markets.”
Second, while the stated purpose of the whistleblower program is to incentivize individuals to come forward with information about potential violations of securities law, this hasn’t actually led to increases in enforcement actions. Instead of legitimate whistleblowers bringing the most credible information to the SEC, the agency now seems to be deluged by tips that are not highly actionable.
But the biggest problem is that uncovering corporate malfeasance is now a legitimate business model, resulting in powerful firms and misaligned incentives. A single law practice led by former SEC assistant director Jordan Thomas captured about 20 percent of all the SEC’s whistleblower awards through 2022, at which point Thomas left to open up a new firm focused exclusively on whistleblowers. We can admire Thomas and his team’s impact on making those guilty of white-collar crimes pay, and also question whether hundreds of millions of dollars of penalties should be funneled through the hands of an SEC insider turned for-profit business mogul.
Whistleblower tips can be used as weapons of corporate warfare. SEC whistleblower complaints are not required to come from inside a company, or even to rely on insider information. They can be filed on the basis of public data, as long as the whistleblower brings original analysis. Companies might dig up dirt on their competitors and submit tips to the SEC. Ransomware groups have used the threat of SEC whistleblower tips as a tactic to pressure the companies they’ve infiltrated into paying ransoms.
The rise of whistleblower firms could lead to them taking particular “assignments” for a fee. Can a company hire one of these firms to investigate its competitors? Can an industry lobbying group under scrutiny (perhaps in cryptocurrencies) pay firms to look at other industries instead and tie up SEC resources? When a firm finds a potential regulatory violation, do they approach the company at fault and offer to cease their research for a “kill fee”? The lack of transparency and accountability of the program means that the whistleblowing firms can get away with practices like these, which would be wholly unacceptable if perpetrated by the SEC itself.
Whistleblowing firms can also use the information they uncover to guide market investments by activist short sellers. Since 2006, the investigative reporting site Sharesleuth claims to have tanked dozens of stocks and instigated at least eight SEC cases against companies in pharma, energy, logistics, and other industries, all after its investors shorted the stocks in question. More recently, a new investigative reporting site called Hunterbrook Media and partner hedge fund Hunterbrook Capital, have churned out 18 investigative reports in their first five months of operation and disclosed short sales and other actions alongside each. In at least one report, Hunterbrook says they filed an SEC whistleblower tip.
Short sellers carry an important disciplining function in markets. But combined with whistleblower awards, the same profit-hungry incentives can emerge. Properly staffed regulatory agencies don’t have the same potential pitfalls.
AI will affect every aspect of this dynamic. AI’s ability to extract information from large document troves will help whistleblowers provide more information to the SEC faster, lowering the bar for reporting potential violations and opening a floodgate of new tips. Right now, there is no cost to the whistleblower to report minor or frivolous claims; there is only cost to the SEC. While AI automation will also help SEC staff process tips more efficiently, it could exponentially increase the number of tips the agency has to deal with, further decreasing the efficiency of the program.
AI could be a triple windfall for those law firms engaged in this business: lowering their costs, increasing their scale, and increasing the SEC’s reliance on a few seasoned, trusted firms. The SEC already, as Platt documented, relies on a few firms to prioritize their investigative agenda. Experienced firms like Thomas’s might wield AI automation to the greatest advantage. SEC staff struggling to keep pace with tips might have less capacity to look beyond the ones seemingly pre-vetted by familiar sources.
But the real effects will be on the conflicts of interest between whistleblowing firms and the SEC. The ability to automate whistleblower reporting will open new competitive strategies that could disrupt business practices and market dynamics.
An AI-assisted data analyst could dig up potential violations faster, for a greater scale of competitor firms, and consider a greater scope of potential violations than any unassisted human could. The AI doesn’t have to be that smart to be effective here. Complaints are not required to be accurate; claims based on insufficient evidence could be filed against competitors, at scale.
Even more cynically, firms might use AI to help cover up their own violations. If a company can deluge the SEC with legitimate, if minor, tips about potential wrongdoing throughout the industry, it might lower the chances that the agency will get around to investigating the company’s own liabilities. Some companies might even use the strategy of submitting minor claims about their own conduct to obscure more significant claims the SEC might otherwise focus on.
Many of these ideas are not so new. There are decades of precedent for using algorithms to detect fraudulent financial activity, with lots of current-day application of the latest large language models and other AI tools. In 2019, legal scholar Dimitrios Kafteranis, research coordinator for the European Whistleblowing Institute, proposed using AI to automate corporate whistleblowing.
And not all the impacts specific to AI are bad. The most optimistic possible outcome is that AI will allow a broader base of potential tipsters to file, providing assistive support that levels the playing field for the little guy.
But more realistically, AI will supercharge the for-profit whistleblowing industry. The risks remain as long as submitting whistleblower complaints to the SEC is a viable business model. Like tax farming, the interests of the institutional whistleblower diverge from the interests of the state, and no amount of tweaking around the edges will make it otherwise.
Ultimately, AI is not the cause of or solution to the problems created by the runaway growth of the SEC whistleblower program. But it should give policymakers pause to consider the incentive structure that such programs create, and to reconsider the balance of public and private ownership of regulatory enforcement.
This essay was written with Nathan Sanders, and originally appeared in The American Prospect.
Two students have created a demo of a smart-glasses app that performs automatic facial recognition and then information lookups. Kind of obvious—something similar was done in 2011—but the sort of creepy demo that gets attention.
News article.
This vulnerability hacks a feature that allows ChatGPT to have long-term memory, where it uses information from past conversations to inform future conversations with that same user. A researcher found that he could use that feature to plant “false memories” into that context window that could subvert the model.
A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker’s website.
For years now, AI has undermined the public’s ability to trust what it sees, hears, and reads. The Republican National Committee released a provocative ad offering an “AI-generated look into the country’s possible future if Joe Biden is re-elected,” showing apocalyptic, machine-made images of ruined cityscapes and chaos at the border. Fake robocalls purporting to be from Biden urged New Hampshire residents not to vote in the 2024 primary election. This summer, the Department of Justice cracked down on a Russian bot farm that was using AI to impersonate Americans on social media, and OpenAI disrupted an Iranian group using ChatGPT to generate fake social-media comments.
It’s not altogether clear what damage AI itself may cause, though the reasons for concern are obvious—the technology makes it easier for bad actors to construct highly persuasive and misleading content. With that risk in mind, there has been some movement toward constraining the use of AI, yet progress has been painstakingly slow in the area where it may count most: the 2024 election.
Two years ago, the Biden administration issued a blueprint for an AI Bill of Rights aiming to address “unsafe or ineffective systems,” “algorithmic discrimination,” and “abusive data practices,” among other things. Then, last year, Biden built on that document when he issued his executive order on AI. Also in 2023, Senate Majority Leader Chuck Schumer held an AI summit in Washington that included the centibillionaires Bill Gates, Mark Zuckerberg, and Elon Musk. Several weeks later, the United Kingdom hosted an international AI Safety Summit that led to the serious-sounding “Bletchley Declaration,” which urged international cooperation on AI regulation. The risks of AI fakery in elections have not sneaked up on anybody.
Yet none of this has resulted in changes that would resolve the use of AI in U.S. political campaigns. Even worse, the two federal agencies with a chance to do something about it have punted the ball, very likely until after the election.
On July 25, the Federal Communications Commission issued a proposal that would require political advertisements on TV and radio to disclose if they used AI. (The FCC has no jurisdiction over streaming, social media, or web ads.) That seems like a step forward, but there are two big problems. First, the proposed rules, even if enacted, are unlikely to take effect before early voting starts in this year’s election. Second, the proposal immediately devolved into a partisan slugfest. A Republican FCC commissioner alleged that the Democratic National Committee was orchestrating the rule change because Democrats are falling behind the GOP in using AI in elections. Plus, he argued, this was the Federal Election Commission’s job to do.
Yet last month, the FEC announced that it won’t even try making new rules against using AI to impersonate candidates in campaign ads through deepfaked audio or video. The FEC also said that it lacks the statutory authority to make rules about misrepresentations using deepfaked audio or video. And it lamented that it lacks the technical expertise to do so, anyway. Then, last week, the FEC compromised, announcing that it intends to enforce its existing rules against fraudulent misrepresentation regardless of what technology it is conducted with. Advocates for stronger rules on AI in campaign ads, such as Public Citizen, did not find this nearly sufficient, characterizing it as a “wait-and-see approach” to handling “electoral chaos.”
Perhaps this is to be expected: The freedom of speech guaranteed by the First Amendment generally permits lying in political ads. But the American public has signaled that it would like some rules governing AI’s use in campaigns. In 2023, more than half of Americans polled responded that the federal government should outlaw all uses of AI-generated content in political ads. Going further, in 2024, about half of surveyed Americans said they thought that political candidates who intentionally manipulated audio, images, or video should be prevented from holding office or removed if they had won an election. Only 4 percent thought there should be no penalty at all.
The underlying problem is that Congress has not clearly given any agency the responsibility to keep political advertisements grounded in reality, whether in response to AI or old-fashioned forms of disinformation. The Federal Trade Commission has jurisdiction over truth in advertising, but political ads are largely exempt—again, part of our First Amendment tradition. The FEC’s remit is campaign finance, but the Supreme Court has progressively stripped its authorities. Even where it could act, the commission is often stymied by political deadlock. The FCC has more evident responsibility for regulating political advertising, but only in certain media: broadcast, robocalls, text messages. Worse yet, the FCC’s rules are not exactly robust. It has actually loosened rules on political spam over time, leading to the barrage of messages many receive today. (That said, in February, the FCC did unanimously rule that robocalls using AI voice-cloning technology, like the Biden ad in New Hampshire, are already illegal under a 30-year-old law.)
It’s a fragmented system, with many important activities falling victim to gaps in statutory authority and a turf war between federal agencies. And as political campaigning has gone digital, it has entered an online space with even fewer disclosure requirements or other regulations. No one seems to agree where, or whether, AI is under any of these agencies’ jurisdictions. In the absence of broad regulation, some states have made their own decisions. In 2019, California was the first state in the nation to prohibit the use of deceptively manipulated media in elections, and has strengthened these protections with a raft of newly passed laws this fall. Nineteen states have now passed laws regulating the use of deepfakes in elections.
One problem that regulators have to contend with is the wide applicability of AI: The technology can simply be used for many different things, each one demanding its own intervention. People might accept a candidate digitally airbrushing their photo to look better, but not doing the same thing to make their opponent look worse. We’re used to getting personalized campaign messages and letters signed by the candidate; is it okay to get a robocall with a voice clone of the same politician speaking our name? And what should we make of the AI-generated campaign memes now shared by figures such as Musk and Donald Trump?
Despite the gridlock in Congress, these are issues with bipartisan interest. This makes it conceivable that something might be done, but probably not until after the 2024 election and only if legislators overcome major roadblocks. One bill under consideration, the AI Transparency in Elections Act, would instruct the FEC to require disclosure when political advertising uses media generated substantially by AI. Critics say, implausibly, that the disclosure is onerous and would increase the cost of political advertising. The Honest Ads Act would modernize campaign-finance law, extending FEC authority to definitively encompass digital advertising. However, it has languished for years because of reported opposition from the tech industry. The Protect Elections From Deceptive AI Act would ban materially deceptive AI-generated content from federal elections, as in California and other states. These are promising proposals, but libertarian and civil-liberties groups are already signaling challenges to all of these on First Amendment grounds. And, vexingly, at least one FEC commissioner has directly cited congressional consideration of some of these bills as a reason for his agency not to act on AI in the meantime.
One group that benefits from all this confusion: tech platforms. When few or no evident rules govern political expenditures online and uses of new technologies like AI, tech companies have maximum latitude to sell ads, services, and personal data to campaigns. This is reflected in their lobbying efforts, as well as the voluntary policy restraints they occasionally trumpet to convince the public they don’t need greater regulation.
Big Tech has demonstrated that it will uphold these voluntary pledges only if they benefit the industry. Facebook once, briefly, banned political advertising on its platform. No longer; now it even allows ads that baselessly deny the outcome of the 2020 presidential election. OpenAI’s policies have long prohibited political campaigns from using ChatGPT, but those restrictions are trivial to evade. Several companies have volunteered to add watermarks to AI-generated content, but they are easily circumvented. Watermarks might even make disinformation worse by giving the false impression that non-watermarked images are legitimate.
This important public policy should not be left to corporations, yet Congress seems resigned not to act before the election. Schumer hinted to NBC News in August that Congress may try to attach deepfake regulations to must-pass funding or defense bills this month to ensure that they become law before the election. More recently, he has pointed to the need for action “beyond the 2024 election.”
The three bills listed above are worthwhile, but they are just a start. The FEC and FCC should not be left to snipe with each other about what territory belongs to which agency. And the FEC needs more significant, structural reform to reduce partisan gridlock and enable it to get more done. We also need transparency into and governance of the algorithmic amplification of misinformation on social-media platforms. That requires that the pervasive influence of tech companies and their billionaire investors should be limited through stronger lobbying and campaign-finance protections.
Our regulation of electioneering never caught up to AOL, let alone social media and AI. And deceiving videos harm our democratic process, whether they are created by AI or actors on a soundstage. But the urgent concern over AI should be harnessed to advance legislative reform. Congress needs to do more than stick a few fingers in the dike to control the coming tide of election disinformation. It needs to act more boldly to reshape the landscape of regulation for political campaigning.
This essay was written with Nathan E. Sanders, and originally appeared in the Atlantic.
Sidebar photo of Bruce Schneier by Joe MacInnis.