AI-Generated Clinical Notes:
How to Ensure Accuracy
and HIPAA Compliance

Ambient AI scribes are one of the fastest adopted technologies in medicine since the EHR. They reduce documentation time and burnout – but peer reviewed studies show up to 31% of AI notes are hallucinations and the technology creates a whole new HIPAA exposure. Accuracy and compliance are two different problems. Here's how to solve both.

31%

of ambient AI notes contained at least one hallucination in a blinded study (vs. 20% for physician notes)

PDQI-9 validated study, 2025

1–3%

overall error rate of modern ambient scribes — low, but in healthcare even small errors carry risk

npj Digital Medicine, 2025

2 of 3

physicians now use health AI — up 78% from 2023, much of it ambient documentation

AMA, 2025

BAA

required before the first transcription — OCR has settled cases for a missing BAA alone, with no breach

HHS.gov, BAA provisions

The clinical note is a paradox: essential for care, communication and legal defense, and one of the biggest drivers of physician burnout. The note is after the visit, on the keyboard, with the patient gone and the next one waiting. Ambient AI scribes attack this directly: they listen to the encounter and generate a draft note automatically, and the productivity gains are real. One npj Digital Medicine review documented a 29.3% reduction in after-hours EHR work, and a study of 263 physicians found burnout dropped from 51.9% to 38.8% within 30 days of adoption.

But two different problems travel with that benefit and they are often confused. The first is accuracy: AI scribes hallucinate, miss, and misattribute clinical content that can have direct impact on patient safety. The second is HIPAA compliance: ambient recording generates new flows of protected health information that trigger Business Associate Agreement, consent, and security obligations most documentation workflows were never designed for. A note can be perfectly compliant and dangerously inaccurate – or perfectly accurate and a HIPAA violation. You have to solve both.

🎯

Problem 1 — Accuracy

AI scribes write plausible but clinically incorrect content: hallucinated drugs, fake history, misattributed symptoms, important omissions. The error rate is low (1–3%) but in medicine the consequence of one error is serious. It is a patient-safety and malpractice problem.

🔒

Problem 2 — HIPAA compliance

Ambient recording sends audio and transcripts of the encounter to a third-party vendor — this creates PHI flows that require a signed BAA, patient consent (in many cases under state recording laws as well) and inclusion in your security risk analysis. This is a regulatory and legal-liability issue.

Part 1 — Accuracy: How AI Notes Actually Go Wrong

The numbers for headline accuracy are encouraging. Today's ambient scribes report error rates of 1–3% in general, far below the 7–11% error rates of older speech-recognition dictation. But the aggregate rate hides the real risk: AI scribes fail in specific, characteristic ways that differ from human error, and several of them are hard to catch on a quick review. A validated blinded study using the PDQI-9 framework found hallucinations in 31% of ambient notes versus 20% of physician-written notes — meaning the AI introduced fabricated content more often than clinicians did.

Hover the highlighted spans below to see the four common failure modes in a realistic note draft.

AI-generated note draft — characteristic failure modes

↓ Hover each highlighted span to see what went wrong

S: 58yo male, follow-up for HTN. Reports good adherence to metoprolol 50mg. Denies chest pain.
O: BP 142/88, HR 76. Heart: regular rhythm.
A: Hypertension, suboptimal control. Patient also reports bilateral leg swelling.
P: Increase dose. Order cardiac stress test. RTC 4 weeks.

Each of these reads as clinically plausible – which is why they are dangerous. A clinician who is skimming the note to sign it may not catch a made up medication or a misattributed symptom. That is why human review is not optional and why the review has to be active and not a rubber stamp.

🌀 Hallucination

The AI produces content without any reference to the encounter: a medication that was never spoken, a symptom that was never reported, a history that does not exist. The most studied and the most dangerous mode of failure.

Found in 31% of ambient notes in PDQI-9 study

⛔ Critical omission

The AI drops a clinically important piece of information that was discussed – an allergy, an abnormal finding, a key part of the history. Omissions are more difficult to spot than hallucinations because nothing on the page looks out of place.

Often invisible on a quick review

🔀 Misattribution

The AI assigns the wrong information to the wrong person – a family member's symptom as the patient's, or one patient's information leaking into another's note in a back-to-back visit.

Common with multi-speaker encounters

🧩 Contextual misinterpretation

The AI does not understand conditional or hypothetical language and misreads "we'll consider X if symptoms worsen" as an imperative or "denies chest pain" as "chest pain".

Negation and conditionals are failure-prone

Documentation error rates by method — peer-reviewed figures

Speech-recognition dictation

7–11% error rate

Ambient AI scribe (overall)

1–3%

Ambient AI — notes w/ ≥1 hallucination

31% of notes

Physician notes — w/ ≥1 hallucination

20% of notes

The overall error rate and per-note hallucination rate measure different things: a note can have a low character-level error rate and still have one fabricated detail. In medicine, it is the one detail that counts. Sources: npj Digital Medicine 2025; PDQI-9 validated study 2025.

The Productivity Paradox

The accuracy problem has a second-order effect on the efficiency case. One study found ambient scribes saved only 34 seconds per note, with high individual variability — because the time saved on drafting can be offset by the time spent on the painstaking correction of AI mistakes. The point is not that AI scribes are not useful: they are, if properly used, clearly useful. The point is that the review step is not a discretionary overhead which can be bypassed to capture the savings – it is part of the workflow and there the risk of inaccuracy is actually controlled.

Part 2 — HIPAA Compliance: The New Data Flows

What changes the instant you turn on an ambient scribe – the audio and transcript of the clinical encounter, both full of PHI – is that it goes to a third party vendor, is processed and comes back as a note. That vendor is, by definition, a business associate under HIPAA – it creates, receives, maintains and transmits PHI on your behalf. That is the single trigger for a whole host of obligations.

The ambient scribe PHI flow — and the compliance checkpoint at each step

📝

A signed BAA — before the first recording

The vendor must sign a Business Associate Agreement before it processes any encounter. If it does not, both provider and vendor are in violation the instant PHI is sent, even if nothing goes wrong. OCR has settled cases for millions where the only failure was a missing BAA, and there was no breach at all. For AI scribes, the BAA should also explicitly prohibit using patient data to train or fine-tune models, and disclose any sub-processors.

45 CFR §164.504(e) · HHS.gov sample BAA provisions

🗣️

Patient consent — HIPAA and state recording law are separate

This is the most missed point. HIPAA controls what happens to PHI after it is created; state wiretapping and eavesdropping statutes control whether the recording was lawful in the first place. In two party consent states you need explicit patient consent to record. A fully HIPAA compliant scribe under a valid BAA can still create criminal and civil exposure if the recording violated state law. Consent to document with a standard script.

State two-party consent laws + HIPAA (two separate layers)

🔍

Security risk analysis — the scribe is ePHI

Audio and transcript files are electronic PHI. The HIPAA Security Rule applies. The ambient scribe should be included in your annual security risk analysis: vulnerabilities, threats and safeguards. OCR has indicated that it will be paying more attention to enforcement of AI tools that use PHI – including how audio is stored, processed and deleted.

HIPAA Security Rule · OCR 2026 enforcement focus

📄

Updated Notice of Privacy Practices + retention policy

The new flow of data that is generated by ambient documentation should be reflected in your Notice of Privacy Practices and the most important operational question – what is done with the audio after processing – should have a defined answer: a retention and deletion policy that automatically deletes the raw audio after a period of time and which is written in the BAA.

NPP update · data retention / deletion policy

🚫

Never use a consumer AI tool for clinical notes

Consumer ChatGPT and other general purpose tools are not designed for PHI: they do not sign BAAs, have no data-retention policy for patient conversation data, and are not built for the HIPAA data lifecycle. Copying a patient conversation into a consumer AI tool to write a note is a definite violation – and the kind of shadow-AI leakage that an inspection layer is supposed to prevent from happening.

Consumer LLMs ≠ HIPAA-eligible deployments

The Lawsuits Are Already Here

This is not a hypothetical risk. In November 2025, patient Jose Saucedo filed a proposed class action against Sharp HealthCare in the San Diego Superior Court alleging that Sharp used an ambient AI documentation tool to record clinical encounters without patient consent – in violation of California's all-party-consent wiretapping statute (CIPA) and the Confidentiality of Medical Information Act (CMIA). The most damaging allegation is that the EHR had boilerplate language stating that patients were "advised" and "consented" to recording when, per the complaint, no such consent was obtained. The proposed class could be over 100,000 patients and CIPA has statutory damages of $5,000 per violation. In addition, California's AB 3030 (effective January 1, 2025) requires providers to add a disclaimer when generative AI produces patient clinical communications unless a licensed provider first reviews the output. Malpractice carriers are now actively flagging AI-documentation hallucinations as an emerging risk category.

"If an AI scribe makes up or alters clinical content (a fact) and a physician approves without proper review, the provider has the malpractice exposure. It is not a hypothetical. It is a new risk category that malpractice carriers are already flagging."

— Health Law Attorney Blog, February 2026

The Workflow That Keeps Both in Check

Solving accuracy and compliance together is about building the right operational workflow – one where the guardrails are part of how the work is done, not an audit you run after the fact. Here's the right sequence.

Verify compliance before the first session

Verify the BAA is signed and that it has AI specific provisions (no model-training, sub-processor disclosure, audio retention and breach notification). Verify your consent disclosure language covers AI assisted documentation and that it complies with your state's recording law. Add the scribe to your security risk analysis. None of this is optional and it all happens before you hit record.

Obtain and document patient consent every time

Use a standard script: a short disclosure at the start of the visit that an AI assistant will listen to help make accurate notes and that the patient can opt out. Document consent. For telehealth across state lines (common in psychiatry and primary care) follow the stricter state's rule. Give patients a real opt out..

Review every draft actively — before signing

The clinician is the one who writes the final note. Review is not a rubber stamp: read for the four failure modes – check medications and doses, scan for missed findings, confirm the right person is being blamed for the symptoms, and that conditional language was not turned into active orders. The clinician who signs owns every word..

Prevent PHI from reaching ungoverned tools

The compliance program only applies if PHI remains within the approved BAA-covered pathway. An inspection layer at the point of use is in place to prevent staff from sending patient data to a consumer AI tool – by pasting a transcript into ChatGPT to "clean it up" or using an unapproved personal scribe app. This is the technical control underpinning the policy.

Log everything — for audit, malpractice defense, and OCR

Keep an audit trail: what tool was used, what was the consent captured, who reviewed and signed each note and what was edited. This is what shows due diligence to OCR, what supports a malpractice defence if a documentation error is alleged and what allows you to see patterns where the AI is always wrong and you can change templates or retrain staff.

Where Polygraf AI Fits

Polygraf AI doesn't replace your ambient scribe, it is the guardian of the data it is protecting. Our Behavioral Control Plane is where clinical staff engage with AI tools, so that PHI only flows to approved BAA-covered endpoints and never leaks to consumer AI tools or unapproved scribe apps. It detects the 18 HIPAA identifiers in real time, blocks or redacts PHI going to ungoverned destinations and logs every interaction for the audit trail OCR expects. It is on-premise with no data egress, so the inspection itself does not create a new PHI exposure. For the compliance side of the AI-notes equation – keeping patient data in the governed pathway – it is the enforcement layer that makes the policy real.

Not legal or clinical advice. This article is a general educational overview by Polygraf AI. HIPAA obligations, state recording-consent laws, and AI-documentation standards vary by jurisdiction and are changing fast. Clinical-accuracy practices should be based on your organization's medical governance. Verify your specific obligations with your qualified healthcare counsel and your compliance team before making any decisions.

Polygraf AI

Keep PHI Inside the Governed Pathway

Polygraf AI guarantees that patient data from AI documentation is only sent to approved BAA covered tools – it detects all 18 HIPAA identifiers in real time and blocks leakage to consumer AI. On-premise, no data egress, full audit trail..

Request a Demo →

Air-gap ready · HIPAA · SOC 2
Deploys in under an hour

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

AI Supply Chain Security: Protecting LLMs and Agents from Model Poisoning

Attackers are targeting the LLMs and MCP servers your agents depend on. Polygraf covers AI supply chain security - what to avoid and how to protect yourself.

Blog Posts

How to Run an AI Risk Assessment: A Framework for Security Teams

You can't manage AI risk you haven't measured. Polygraf's AI risk assessment framework helps security teams identify exposure, quantify likelihood, and manage all AI risks.

Blog Posts

AI Audit Trails: What You Need to Log and Why Regulators Are Checking

Regulators are asking for AI audit trails enterprises can't produce. Polygraf AI explains what to log and how long to retain it.

To learn more about Polygraf, please get in touch.

At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.

AI-Generated Clinical Notes:
How to Ensure Accuracy
and HIPAA Compliance

Part 1 — Accuracy: How AI Notes Actually Go Wrong

Part 2 — HIPAA Compliance: The New Data Flows

The Workflow That Keeps Both in Check

Keep PHI Inside the Governed Pathway

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

AI Supply Chain Security: Protecting LLMs and Agents from Model Poisoning

Blog Posts

How to Run an AI Risk Assessment: A Framework for Security Teams

Blog Posts

AI Audit Trails: What You Need to Log and Why Regulators Are Checking

To learn more about Polygraf, please get in touch.

Data Privacy

Data Provenance

Developers

Company

AI-Generated Clinical Notes:How to Ensure Accuracyand HIPAA Compliance

See your organization's HIPAA & AI exposure — in 5 minutes

Part 1 — Accuracy: How AI Notes Actually Go Wrong

Part 2 — HIPAA Compliance: The New Data Flows

The Workflow That Keeps Both in Check

Keep PHI Inside the Governed Pathway

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

AI Supply Chain Security: Protecting LLMs and Agents from Model Poisoning

Blog Posts

How to Run an AI Risk Assessment: A Framework for Security Teams

Blog Posts

AI Audit Trails: What You Need to Log and Why Regulators Are Checking

To learn more about Polygraf, please get in touch.

Data Privacy

Data Provenance

Developers

Company

thank you

Thank you!

AI-Generated Clinical Notes:
How to Ensure Accuracy
and HIPAA Compliance