The $25 Million Zoom Call: How Deepfakes Are Targeting Businesses

A Routine Video Call That Cost $25 Million

In January 2024, a finance department employee at Arup — a multinational engineering firm with 18,000 staff — received a message from the company's UK-based chief financial officer requesting a secret financial transaction. The employee was skeptical at first. Then came the video call.

On the call were the CFO and several other colleagues the employee recognized. Their faces matched. Their voices matched. They discussed the transaction in detail, answered questions, and authorized the transfers. Over the course of the meeting, the employee executed 15 transactions totaling HK$200 million — approximately $25.6 million USD.

Every person on that call was a deepfake. The employee was the only real human in the meeting.

$25.6M

stolen in a single deepfake video call (Arup, Hong Kong, 2024)

How Multi-Person Deepfake Calls Work

Real-time deepfake video technology has advanced to the point where multiple synthetic participants can appear simultaneously on a video call. The attacker uses pre-recorded or publicly available video of each person — from conference talks, YouTube interviews, or company websites — to build face and voice models. During the call, the attacker controls all the fake participants, responding to questions and maintaining natural conversational flow.

A few minutes of video footage of each person (often available from public talks, interviews, or social media)
Audio samples for voice cloning — as little as 3 seconds for a basic clone, 30 seconds for a high-quality one
Publicly available information about the company, its org chart, and the target's role
Real-time deepfake software — several commercial tools exist for under $100/month
A plausible pretext: a merger, an acquisition, a regulatory investigation — anything that justifies secrecy and urgency

Why Video Calls Feel Trustworthy (and Shouldn't)

Humans evolved to trust faces. When you can see someone — their expressions, their body language, their eye contact — your brain registers them as "present" and "real." Video calls activate the same neural trust pathways as in-person meetings. This is why the Arup employee followed the instructions: he could see the CFO. He could hear the CFO. His brain had no reason to doubt it.

But a deepfake video is not a face. It's a mathematical model rendered in real time. The trust your brain assigns to that face is based on millions of years of evolution meeting technology that's existed for less than five years. Evolution loses.

"The employee had every reason to believe the call was genuine. The participants looked and sounded like colleagues he knew. This technology has moved beyond what the human eye and ear can reliably detect." — Hong Kong police superintendent Baron Chan

Voice Cloning for Executive Impersonation

Executive voices are among the easiest to clone because executives are the most publicly audible people in a company. Earnings calls, keynote speeches, podcast interviews, panel discussions — all provide the raw audio that AI needs to build a voice model. The American Bar Association notes that as little as 3 seconds of audio can produce an 85% voice match.

In the UK, a CEO was scammed out of $243,000 after receiving a call from what sounded exactly like his boss at the parent company. The voice clone replicated not just the voice but the slight German accent and speech patterns. The CEO authorized three wire transfers before becoming suspicious.

Why Multi-Factor Authentication Doesn't Stop This

MFA is designed to verify that you are who you claim to be when logging into a system. It doesn't verify the identity of the person asking you to do something. When a deepfake CFO calls and asks the finance team to wire money, no MFA challenge is triggered — the finance employee is legitimately logged into the banking system with their own credentials. They're authorizing the transfer themselves. The problem isn't authentication of the employee; it's verification of the request.

MFA answers: "Is this person authorized to access the system?" A safeword answers: "Is the person giving instructions who they claim to be?" These are fundamentally different security questions, and both need answers.

The Knowledge-Based Layer Deepfakes Can't Replicate

A deepfake can replicate anything that's publicly observable: face, voice, mannerisms, background, clothing. What it cannot replicate is information that exists only in the minds of two or more people and was never transmitted digitally. A safeword shared in person, never written down or sent electronically, is invisible to AI. It can't be scraped, intercepted, or inferred.

This is why NIST SP 800-63B approves voice-based out-of-band authentication: the principle of verifying identity through a separate channel using shared knowledge is cryptographically sound, even when the verification medium (voice) is analog.

How to Set Up Workplace Verification Protocols

The Arup attack would have been stopped by a single question: "What's our verification word?" Here's how to ensure your organization is prepared.

Establish safewords in person at team meetings — one per department or functional group
Financial transactions above a threshold require verbal verification with the safeword
IT helpdesk requires the department safeword before password resets or MFA changes
Video call verification: at the start of any call involving financial decisions, participants exchange the safeword
Callback verification: after receiving an unusual request, hang up and call back on a known number, then verify the safeword
Vendor safewords: establish separate verification words with each critical vendor for payment-related communications
Quarterly rotation: change the safeword every quarter at an in-person meeting

This Will Happen Again

The Arup attack was not an anomaly — it's a preview. Deepfake technology is improving exponentially while getting cheaper. Real-time face and voice synthesis tools are available for under $100 per month. The barrier to executing this attack is near zero. Any company with executives who have public-facing video or audio content is a potential target.

The question isn't whether your company will face a deepfake impersonation attempt. It's whether your team will know what to ask when it happens.

Use the Safewords.io Protocol Builder to create a workplace verification protocol today. Select "Workplace" as your group type, add your team, and define the scenarios where safeword verification is required. Print the security card and distribute it at your next team meeting.

The $25 Million Zoom Call: How Deepfakes Are Targeting Businesses

A Routine Video Call That Cost $25 Million

How Multi-Person Deepfake Calls Work

Why Video Calls Feel Trustworthy (and Shouldn't)

Voice Cloning for Executive Impersonation

Why Multi-Factor Authentication Doesn't Stop This

The Knowledge-Based Layer Deepfakes Can't Replicate

How to Set Up Workplace Verification Protocols

This Will Happen Again

Related Articles

What Are Deepfakes and Why Should Your Family Care?

Real Stories: Families Who Lost Thousands to AI Voice Scams

Ready to protect the people you trust?