NYU Researchers Develop New Real-Time Deepfake Detection Method - IEEE Spectrum
Close bar

NYU Researchers Develop New Real-Time Deepfake Detection Method

Chinmay Hegde is exploring challenge-response systems for detecting audio and video deepfakes

5 min read
A photo of a face on a computer monitor with a series of lines on the face.

Deepfake video and audio is powerful in the hands of bad actors. NYU Tandon researchers are developing new techniques to combat deepfake threats.

NYU Tandon

This sponsored article is brought to you by NYU Tandon School of Engineering.

Deepfakes, hyper-realistic videos and audio created using artificial intelligence, present a growing threat in today’s digital world. By manipulating or fabricating content to make it appear authentic, deepfakes can be used to deceive viewers, spread disinformation, and tarnish reputations. Their misuse extends to political propaganda, social manipulation, identity theft, and cybercrime.

As deepfake technology becomes more advanced and widely accessible, the risk of societal harm escalates. Studying deepfakes is crucial to developing detection methods, raising awareness, and establishing legal frameworks to mitigate the damage they can cause in personal, professional, and global spheres. Understanding the risks associated with deepfakes and their potential impact will be necessary for preserving trust in media and digital communication.

That is where Chinmay Hegde, an Associate Professor of Computer Science and Engineering and Electrical and Computer Engineering at NYU Tandon, comes in.

A photo of a smiling man in glasses.Chinmay Hegde, an Associate Professor of Computer Science and Engineering and Electrical and Computer Engineering at NYU Tandon, is developing challenge-response systems for detecting audio and video deepfakes.NYU Tandon

“Broadly, I’m interested in AI safety in all of its forms. And when a technology like AI develops so rapidly, and gets good so quickly, it’s an area ripe for exploitation by people who would do harm,” Hegde said.

A native of India, Hegde has lived in places around the world, including Houston, Texas, where he spent several years as a student at Rice University; Cambridge, Massachusetts, where he did post-doctoral work in MIT’s Theory of Computation (TOC) group; and Ames, Iowa, where he held a professorship in the Electrical and Computer Engineering Department at Iowa State University.

Hegde, whose area of expertise is in data processing and machine learning, focuses his research on developing fast, robust, and certifiable algorithms for diverse data processing problems encountered in applications spanning imaging and computer vision, transportation, and materials design. At Tandon, he worked with Professor of Computer Science and Engineering Nasir Memon, who sparked his interest in deepfakes.

“Even just six years ago, generative AI technology was very rudimentary. One time, one of my students came in and showed off how the model was able to make a white circle on a dark background, and we were all really impressed by that at the time. Now you have high definition fakes of Taylor Swift, Barack Obama, the Pope — it’s stunning how far this technology has come. My view is that it may well continue to improve from here,” he said.

Hegde helped lead a research team from NYU Tandon School of Engineering that developed a new approach to combat the growing threat of real-time deepfakes (RTDFs) – sophisticated artificial-intelligence-generated fake audio and video that can convincingly mimic actual people in real-time video and voice calls.

High-profile incidents of deepfake fraud are already occurring, including a recent $25 million scam using fake video, and the need for effective countermeasures is clear.

In two separate papers, research teams show how “challenge-response” techniques can exploit the inherent limitations of current RTDF generation pipelines, causing degradations in the quality of the impersonations that reveal their deception.

In a paper titled “GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response” the researchers developed a set of eight visual challenges designed to signal to users when they are not engaging with a real person.

“Most people are familiar with CAPTCHA, the online challenge-response that verifies they’re an actual human being. Our approach mirrors that technology, essentially asking questions or making requests that RTDF cannot respond to appropriately,” said Hegde, who led the research on both papers.

A series of images with people's faces in rows.Challenge frame of original and deepfake videos. Each row aligns outputs against the same instance of challenge, while each column aligns the same deepfake method. The green bars are a metaphor for the fidelity score, with taller bars suggesting higher fidelity. Missing bars imply the specific deepfake failed to do that specific challenge.NYU Tandon

The video research team created a dataset of 56,247 videos from 47 participants, evaluating challenges such as head movements and deliberately obscuring or covering parts of the face. Human evaluators achieved about 89 percent Area Under the Curve (AUC) score in detecting deepfakes (over 80 percent is considered very good), while machine learning models reached about 73 percent.

“Challenges like quickly moving a hand in front of your face, making dramatic facial expressions, or suddenly changing the lighting are simple for real humans to do, but very difficult for current deepfake systems to replicate convincingly when asked to do so in real-time,” said Hegde.

Audio Challenges for Deepfake Detection

In another paper called “AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response,” researchers created a taxonomy of 22 audio challenges across various categories. Some of the most effective included whispering, speaking with a “cupped” hand over the mouth, talking in a high pitch, pronouncing foreign words, and speaking over background music or speech.

“Even state-of-the-art voice cloning systems struggle to maintain quality when asked to perform these unusual vocal tasks on the fly,” said Hegde. “For instance, whispering or speaking in an unusually high pitch can significantly degrade the quality of audio deepfakes.”

The audio study involved 100 participants and over 1.6 million deepfake audio samples. It employed three detection scenarios: humans alone, AI alone, and a human-AI collaborative approach. Human evaluators achieved about 72 percent accuracy in detecting fakes, while AI alone performed better with 85 percent accuracy.

The collaborative approach, where humans made initial judgments and could revise their decisions after seeing AI predictions, achieved about 83 percent accuracy. This collaborative system also allowed AI to make final calls in cases where humans were uncertain.

“The key is that these tasks are easy and quick for real people but hard for AI to fake in real-time” —Chinmay Hegde, NYU Tandon

The researchers emphasize that their techniques are designed to be practical for real-world use, with most challenges taking only seconds to complete. A typical video challenge might involve a quick hand gesture or facial expression, while an audio challenge could be as simple as whispering a short sentence.

“The key is that these tasks are easy and quick for real people but hard for AI to fake in real-time,” Hegde said. “We can also randomize the challenges and combine multiple tasks for extra security.”

As deepfake technology continues to advance, the researchers plan to refine their challenge sets and explore ways to make detection even more robust. They’re particularly interested in developing “compound” challenges that combine multiple tasks simultaneously.

“Our goal is to give people reliable tools to verify who they’re really talking to online, without disrupting normal conversations,” said Hegde. “As AI gets better at creating fakes, we need to get better at detecting them. These challenge-response systems are a promising step in that direction.”