Deepfake Voice Fraud:
How AI Audio Is Being Used to
Impersonate Executives

A finance employee joined a video call with the CFO and several colleagues. Everyone looked right. Everyone sounded right. Every one of them was an AI deepfake — and $25.6M was gone by the end of the day. Voice cloning now needs three seconds of audio. This is how the attack works, and how to stop it.

$25.6M
stolen from engineering firm Arup via a deepfake CFO on a video call — 15 transfers in one day
3 sec
of clear audio is now enough to clone a convincing synthetic voice of a target
$1.1B
in US deepfake-related fraud losses in 2025 — projected toward $40B by 2027 (Deloitte)
$2.77B
in AI-powered BEC losses across 21,442 incidents reported to the FBI in 2024

For most of corporate history, hearing someone's voice has been proof of who they are. "I spoke to them directly" has been the gold standard of verification — the thing you fall back on when an email looks suspicious. That assumption expired in 2024, and the bill is already in the tens of millions. Deepfake voice fraud uses AI-generated audio to impersonate a trusted person — almost always a senior executive — and manipulate an employee into authorizing a wire transfer, sharing credentials, or bypassing a control. It works because it attacks something no firewall protects: the human instinct to trust a familiar voice.

The technology crossed a threshold. Modern voice-cloning tools can produce a convincing synthetic voice from roughly three seconds of clear audio — and executives leave far more than three seconds of themselves on earnings calls, conference talks, podcasts, and LinkedIn videos. The raw material to clone a CEO is sitting in public, for free. At Polygraf AI, we focus on the data and control layer that surrounds AI in the enterprise; deepfake voice fraud is the human-facing edge of the same problem. This is how the attack works, the real cases, the red flags, and the verification architecture that actually defeats it.

Why This Defeats Traditional Security

The controls most organizations invested in — MFA, endpoint detection, email filters, network monitoring — guard perimeters and endpoints. A deepfake fraud call barely touches any of them. No malware is deployed, no system is breached, no credential is stolen by force. As Arup's CIO Rob Greig put it after his firm lost $25M: the attackers didn't compromise a single system — they compromised the trust of a single employee. The gap isn't in the technology stack; it's in how decisions get made.

Interactive · Can you tell?

One of these voices is a deepfake. Which one?

Listen to both clips. One is a real human voice; the other is AI-generated. Pick the one you think is the deepfake, then reveal the answer — and see how your ear stacks up against the odds.

Sample A
Tap to choose as deepfake
Sample B
Tap to choose as deepfake

Anatomy of a Real Attack — The Arup $25.6M Heist

In January 2024, the global engineering firm Arup lost the equivalent of $25.6 million (HK$200 million) in what remains the most expensive documented deepfake fraud. The mechanics are worth walking through step by step, because the same playbook is now being run at smaller scale against companies every week.

Video conference — "confidential transaction" REC
DEEPFAKE
CFO
"Chief Financial Officer"
UK head office · authorizing transfer
DEEPFAKE
D1
"Director"
DEEPFAKE
C1
"Colleague"
REAL
YOU
Finance employee
the only real person on the call
Illustrative reconstruction. On the actual Arup call, every participant except the victim was an AI deepfake — built from public video and audio of real executives.
1
Reconnaissance + initial contact
A phishing email from the "CFO"
An employee in Arup's Hong Kong finance office received a message purporting to be from the firm's UK-based CFO, referencing a confidential, time-sensitive transaction. The attackers had already gathered public video and audio of Arup executives — from conferences, recordings, and the open web — to build the deepfake models.
"...a secret, time-sensitive transaction that needs to be executed."
2
The employee's instinct fires correctly
Healthy suspicion — then a request to "verify"
The employee was initially suspicious. The request was unusual. This is exactly the instinct security training drills into people — and it worked. So the employee did the careful thing: asked for a video call to confirm the request was real. The attackers were ready for precisely this move.
3
The masterstroke — manufactured consensus
A video call full of deepfaked colleagues
On the call, the employee saw the CFO and several other familiar colleagues — all of whom looked and sounded exactly like the real people. Every single participant was an AI deepfake. The presence of multiple trusted faces, all corroborating the request in real time, did what no email could: it overrode the employee's earlier suspicion with social proof.
Hong Kong police: "In the multi-person video conference, it turns out that everyone he saw was fake."
4
Execution under pressure
15 transfers to 5 accounts in a single day
Convinced and pressured by urgency and confidentiality, the employee executed 15 separate transactions totaling HK$200 million (~$25.6M) to five Hong Kong bank accounts. The deepfakes provided specific account numbers, amounts, and step-by-step instructions — the hallmarks of a well-rehearsed operation.
5
Discovery — too late
A simple callback that should have come first
The fraud was discovered only when the employee later followed up with Arup's actual headquarters about the "secret transaction." The real executives had authorized nothing, held no such meeting, and knew nothing about it. The single out-of-band callback that uncovered the fraud is exactly the control that — applied before the transfer — would have prevented it entirely.

"This happens more frequently than a lot of people realize. The attackers didn't compromise any of our systems or data — it was technology-enhanced social engineering."

— Rob Greig, CIO of Arup, World Economic Forum, 2025
It Almost Happened to LastPass Too

Arup is the most expensive case, not an isolated one. In 2024, a LastPass employee received deepfaked audio messages impersonating the company's CEO over WhatsApp — using a cloned voice to create urgency around a fake request. That attempt failed for one reason: the employee recognized the contact through an unusual channel and outside business hours as a red flag, and reported it rather than acting. The difference between a headline and a non-event was a single employee's willingness to question a familiar voice.

Why It Works — The Psychology Attackers Exploit

Deepfake voice fraud doesn't succeed randomly. It exploits documented patterns in how humans process authority, urgency, and familiarity — the same levers that make social engineering effective, amplified by a synthetic voice that removes the last natural check.

The four psychological levers in a deepfake voice attack
Employee decision point ① Authority The "CEO/CFO" is asking. Questioning leadership feels risky. ② Urgency "This must happen now." No time to stop and verify. ③ Familiarity "I recognize that voice." A known voice reads as proof. ④ Social proof Multiple "colleagues" agree. Consensus erases lone doubt. Stacked together in a live call, these levers overwhelm the verification instinct — which is why process, not judgment, is the defense.

Who Gets Targeted — And Why

Attackers don't pick targets at random. They follow access and authority: the people who can move money, reset credentials, or change payment details, combined with enough public audio of the executive being impersonated.

💸
Finance & AP staff
The highest-priority target. Anyone authorized to initiate wire transfers — a single successful call can yield six or seven figures before the transaction is flagged.
🔑
IT & help desk
Targeted for credential resets and MFA bypasses. A cloned executive voice demanding an urgent password reset exploits the help desk's service orientation.
👔
Executive assistants
Have privileged access to executives' schedules, communications, and approvals — and are accustomed to acting quickly on leadership requests.
🤝
Vendor / payment teams
Targeted for fraudulent changes to vendor banking details — redirecting legitimate payments to attacker-controlled accounts.
👥
HR & payroll
Manipulated into changing direct-deposit details or releasing employee PII — a growing secondary target as finance controls tighten.
🎙️
The executives themselves
Not as victims, but as voice sources. Public-facing leaders with abundant recorded audio are the easiest to clone convincingly.

The Red Flags — What to Train People to Catch

Because the voice itself is no longer a reliable signal, the warning signs shift to the structure of the request. These are the patterns that should trigger verification regardless of how authentic the caller sounds.

🚩
Urgency + confidentiality together. "This is time-sensitive and must stay between us" is the signature combination — it manufactures pressure while removing the second opinion that would expose the fraud.
🚩
A request that bypasses normal process. Any ask to skip the usual approval workflow, "just this once," for someone senior, is the exact shape of a deepfake fraud.
🚩
New or unusual payment details. First-time beneficiary accounts, changed vendor banking info, or accounts in unexpected jurisdictions warrant out-of-band confirmation every time.
🚩
Unusual channel or timing. A request arriving over WhatsApp, a personal number, or outside business hours — as in the LastPass attempt — is a signal in itself, regardless of who it appears to be from.
🚩
Resistance to verification. A genuine executive will not object to a callback on a known number. Pushback against a verification step — "there's no time for that" — is itself the red flag.
🚩
Subtle audio/video artifacts. Slightly off lip-sync, unnatural pauses, flat emotional tone, or audio that doesn't match the room. Useful, but never rely on these alone — quality keeps improving.
Why You Can't Train Your Way Out of This Alone

Humans correctly identify high-quality deepfakes only around a quarter of the time for video. The Arup employee was suspicious and still lost $25M, because the deepfake quality overwhelmed the doubt. The lesson is not that detection is hopeless — purpose-built AI detection models now analyze audio far more reliably than the human ear — but that you cannot rely on a person's perception as the control. The durable defense pairs a verification process that doesn't depend on spotting the fake with machine detection that checks the audio for you.

The Defense That Actually Works — Verification by Process

The organizations that don't fall to deepfake voice fraud aren't the ones with the sharpest-eared employees. They're the ones who built verification into the process so that catching the fake isn't required — the control fires regardless. Here's the architecture.

1
Mandatory out-of-band callback for high-value actions
The single most effective control. Any wire transfer, payment-detail change, or credential reset above a threshold requires a callback to a known, pre-stored number — never a number provided in the request itself. This is the control that uncovered the Arup fraud after the fact; applied before the transfer, it prevents it. Cyber-insurance policies increasingly require documented callback verification before they'll pay a deepfake-fraud claim.
2
Dual authorization for fund movements
No single employee should be able to move significant funds alone, regardless of who instructs them. Requiring two independent approvers for transactions above a threshold means a deepfake has to compromise two people through separate channels — dramatically harder. Insurers have begun excluding claims where no dual-authorization process existed.
3
A pre-agreed verification phrase or challenge
A shared code word or challenge-response known only to the real parties, used to confirm identity on any sensitive request. Simple, free, and effective — a deepfake of the voice can't supply a secret it never had access to. Establish it out-of-band, never over the channel being verified.
4
A culture where questioning authority is expected
The Arup employee hesitated, then complied under social pressure. Employees need explicit, repeated permission to follow verification protocols even when an "executive" expresses impatience — and they must be supported, never criticized, for delaying a suspicious request. Make security procedures non-negotiable from the top down, so no cloned voice can pressure its way past them.
5
Realistic simulation, not slideware
Awareness slides don't prepare anyone for a familiar face asking for money on a live call. Interactive simulations of deepfake whaling and vishing under decision pressure build the reflex, so the first cloned voice an employee hears isn't the first time they've practiced refusing one. Structured simulation programs have measurably improved verification behavior and cut successful compromises.
6
Reduce the public voice/data attack surface
Attackers build clones from public audio and harvested personal data. Monitoring for executives' voices, names, and personal data appearing in contexts that suggest a targeting campaign — and limiting unnecessary public exposure of that raw material — raises the cost of building a convincing clone in the first place.
7
Real-time AI detection on the audio itself
Process controls defend the decision; detection defends the channel. A deepfake audio detection model that analyzes a live call second-by-second can flag synthetic or cloned speech while the conversation is still happening — turning the voice from an unverifiable signal back into one you can check. It doesn't replace out-of-band verification, but it adds a technical tripwire that doesn't depend on an employee perceiving the fake.
Where Polygraf AI Fits

Polygraf AI attacks deepfake voice fraud from both sides. First, our deepfake audio detection model analyzes voice in real time — performing per-second analysis of a call's audio to flag synthetic, cloned, or AI-generated speech as it happens, so a fake voice can be caught mid-conversation. Second, our Behavioral Control Plane governs the data layer these attacks feed on: detecting and blocking exfiltration of the personal and financial data attackers harvest to build convincing impersonations, controlling which AI tools employees use, and logging every interaction. Together, that means a detection layer for the voice itself and a control layer for the data environment around it — on-premise, sub-100ms, zero data egress. Pair it with strong out-of-band verification, and you cover the human, the audio, and the data layers of a deepfake-resilient program.

Polygraf AI

Detect Deepfake Voices in Real Time

Polygraf AI's deepfake audio detection model flags synthetic and cloned voices as a call happens, with per-second analysis — and our Behavioral Control Plane hardens the data layer attackers use to build executive deepfakes. On-premise, sub-100ms, zero data egress.

Request a Demo →
Air-gap ready · HIPAA · SOC 2
Deploys in under an hour

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

Voice cloning takes minutes and costs nothing. Polygraf AI documents how deepfake audio is being used to impersonate executives in fraud schemes.

Blog Posts

AI-generated clinical notes create compliance risks most healthcare IT teams haven't addressed. Polygraf AI's guide explains how to work with HIPAA data in an AI age.

To learn more about Polygraf, please get in touch.

At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.

Products

thank you

Your download will start now.

Thank you!

Please provide information below and
we will send you a link to download the white paper.