27 Types of PII Your AI Tools
Are Probably Exposing
Right Now

Most companies are tracking 5–6 PII categories in their data governance programs and their AI tools are processing at least 27. The gap between what you are governing and what is moving is where your next compliance finding will live.

Identity
7 types
Financial
5 types
Health
5 types
Behavioral
5 types
Professional
5 types

In Q4 2025, 34.8% of ChatGPT inputs contained sensitive data — a three-fold increase from 2023. According to the 2026 Cloud and Threat Report from Netskope, the average organization now sees 223 GenAI-related data policy violations a month, and the top quartile sees 2,100.

The reason these numbers are so high is not employees being careless. It is a definitional failure: enterprise data governance programs were built on a small set of PII categories that the regulators cared about in 2015. Name, SSN, date of birth, credit card number, email address. These programs were not built for an AI world where employees are pasting entire customer emails, medical notes, salary spreadsheets, and source code repositories into AI prompts as part of a normal workday.

34.8%
of ChatGPT inputs in Q4 2025 contained sensitive data — 3× increase from 2023
223
GenAI-linked data policy violations per month in the average organization
23.8M
credentials and secrets leaked via AI tools in 2024 — 25% jump year-over-year
500%
increase in prompts sent to GenAI services year-over-year — from 3K to 18K/month per org
How PII enters AI tools — the 6 entry pathways most DLP misses
AI Tool Prompt context window Direct paste Copy-paste from CRM / HR File upload PDFs, spreadsheets, docs API / integration CRM, HR, EHR systems Meeting AI Transcripts, summaries RAG retrieval Knowledge base chunks Coding agents Repo files, configs, .env PII exits in responses / logs / fine-tuning data Traditional DLP watches file transfers and email — it is blind to all 6 pathways above
Why Traditional DLP Misses Most of This

Traditional DLP was designed for structured data in transit – files over email or USB drives, database queries over the wire. It was not designed to check if the natural language text that someone typed in a browser prompt contains a patient's diagnosis, a salary, or a contract clause. If an employee copies a customer email into ChatGPT, no file is transferred. No network alert is triggered. The data is just passed over an HTTP POST request to an external API. Generic NLP-based PII detectors miss 13–46% of sensitive entities depending on category and context.

Category 1: Identity PII
7 types · GDPR / CCPA / HIPAA scope

These are the types that most organizations already govern in their DLP – but miss in the AI context because they are embedded in unstructured text (emails, meeting notes, support tickets) rather than in structured database fields

01Full legal nameCritical
First + last name – trivially identifiable when combined with any other type of PII.
How it enters AI: "Summarize this email chain from John Smith about his refund."
GDPR Art.4 · CCPA · PIPEDA
02Government ID numbersCritical
SSN, NIN, SIN, passport numbers, national ID. The highest risk PII category – used directly to commit identity fraud and account takeover. Detectors must be able to handle SSN with/without dashes and partial redaction.
How it enters AI: HR onboarding forms, I-9 verification docs uploaded to AI for processing.
HIPAA · GDPR · All state privacy laws
03Date and place of birthHigh
DOB is a HIPAA safe harbor field on its own. When used in combination with name or ZIP code, it is very narrowly identified. Age derived from DOB is protected in many jurisdictions when used in employment or insurance.
How it enters AI: Patient summaries, HR records, customer verification prompts.
HIPAA Safe Harbor · GDPR · CCPA
04Home addressCritical
Street address, city, state, zip, country. Found in shipping records, HR files, insurance records. By itself it is identifiable, when combined with name it is a complete physical-world locator. Five-digit zip code alone is HIPAA protected.
How it enters AI: "Draft a letter to this customer at [address] about their order."
HIPAA · GDPR · CCPA · CAN-SPAM
05Phone numberHigh
Mobile and landline numbers. The detectors must process international formats (+44 20, 07xxx), partial numbers and numbers in free text fields instead of in structured fields. Mobile numbers are used more and more as authentication factors, and the exposure of them makes SIM-swap attacks possible.
How it enters AI: Customer records, support tickets, meeting notes with contact details.
GDPR · TCPA · CCPA
06Email addressHigh
Work and personal email addresses. Very common in AI prompts as they are in every business communication. Email addresses are the main authentication identifier for most enterprise applications — exposure leads to phishing and credential stuffing
How it enters AI: Forwarded email threads, customer contact lists, meeting invitations.
GDPR · CCPA · CAN-SPAM
07Biometric dataCritical
Fingerprints, facial recognition templates, voice prints, retinal scans. Not changeable — biometrics cannot be changed after being exposed. Any AI system that processes employee attendance or access control records may deal with biometric identifiers.
How it enters AI: HR access control reports, employee attendance records.
BIPA (Illinois) · GDPR Art.9 · CCPA
Category 2: Financial PII
5 types · PCI-DSS / GLBA / SOX scope

Financial PII is the most frequently mentioned DLP category and the most often leaked in AI prompts – because the finance teams are among the heaviest AI adopters. Employees write financial reports, work with spreadsheets, and summarize earnings data with AI tools every day.

08Payment card data (PAN)Critical
Primary Account Numbers (16-digit card numbers), CVV/CVC, expiration dates. PCI-DSS explicitly forbids storing PAN in any AI training set. The detectors should be able to detect partial PAN (last 4 digits are enough to detect in context), masked formats (XXXX-XXXX-XXXX-1234) and numbers in transaction descriptions.
How it enters AI: Transaction logs, customer service records, chargeback summaries.
PCI-DSS v4.0 · GDPR
09Bank account numbersCritical
Account numbers, routing numbers, IBAN/BIC codes, appear in accounts payable, vendor onboarding and payroll. Finance teams that use AI to automate invoice processing often expose them. Routing + account number is enough for ACH fraud.
How it enters AI: Invoice automation, vendor payment records, payroll reconciliation.
GLBA · PCI-DSS · GDPR
10Individual salary and compensationHigh
Salary, bonus, equity, total compensation. Personal data under GDPR and most state privacy laws. HR and finance teams copy salary spreadsheets into AI tools for analysis, benchmarking or offer letter drafting. Salary and employee name never together.
How it enters AI: Offer letter drafting, compensation analysis, pay equity reviews.
GDPR · CCPA · UK GDPR
11Tax identification numbersCritical
EIN, TIN, VAT numbers. Tax IDs are found on vendor onboarding documents, W-9s and international contracts. They are unique to both individuals and organizations and can be used for organizational identity theft (a risk that most PII programs under-weight).
How it enters AI: Vendor onboarding, tax document processing, international contracting.
IRS regulations · GDPR · State privacy laws
12Credit scores and financial historyHigh
Credit scores, payment history, debt, bankruptcy, loan. FCRA in US. Financial services AI tools that process customer applications often come across this data. Fine-tuning an LLM on credit reports without de-identification can result in the embedding of protected financial data into the model weights.
How it enters AI: Loan application processing, credit decisioning, customer financial analysis.
FCRA · GLBA · GDPR
Category 3: Health & Medical PII
5 types · HIPAA PHI · GDPR Art.9 scope

Health PII has the most stringent regulatory requirements of any category: HIPAA violations begin at $100 per violation and can reach up to $1.9M per category per year. Healthcare organizations are among the most aggressive AI adopters. Clinical documentation, patient communication and care coordination workflows all produce PHI that AI tools see every day.

13Diagnosis and conditionsCritical
Medical diagnosis, chronic disease, mental health history, substance use disorder, HIV status, genetic condition. HIPAA Safe Harbor needs to remove all 18 PHI identifiers: diagnosis codes (ICD-10) plus any identifier creates a risk of re-identification. AI scribe tools that summarize clinical notes process this information in every session.
How it enters AI: AI scribe tools, EHR summarization, clinical decision support prompts.
HIPAA · GDPR Art.9 · 42 CFR Part 2
14Prescription and medication dataCritical
Prescription history, drug name, dosage, pharmacy record. Discovers the underlying condition and is protected by HIPAA, 42 CFR Part 2 (substance use) and state prescription monitoring programs. Exposed in pharmacy workflow automation and patient communication AI tools.
How it enters AI: Pharmacy automation, medication reconciliation, patient discharge summaries.
HIPAA · 42 CFR Part 2 · State pharmacy laws
15Insurance and benefits dataHigh
Health insurance member ID, plan, EOB, prior authorization. Used in healthcare billing, HR benefits and insurance customer service AI tools. Insurance ID is one of 18 HIPAA Safe Harbor identifiers.
How it enters AI: Insurance verification, prior auth automation, HR benefits AI tools.
HIPAA Safe Harbor · ACA · GDPR
16Mental health recordsCritical
Mental health treatment history, psychiatric diagnoses, therapy session notes. This information is subject to even greater protection than general PHI in many states (California, New York and most EU member states consider mental health data as a special category of sensitive information). AI meeting summarization tools may include references to employee mental health in HR conversations.
How it enters AI: EHR AI tools, HR meeting summaries, employee assistance program (EAP) communications.
HIPAA · GDPR Art.9 · State mental health privacy laws
17Genetic dataCritical
DNA sequences, genetic test results, family health history for genetic analysis. GINA (Genetic Information Nondiscrimination Act) in the US, GDPR Article 9 in the EU. Genetic data is immutable – no exposure can be fixed by issuing a new identifier. Becoming more and more important as genomics companies use LLMs for research and for patients.
How it enters AI: Genomics research tools, clinical AI for hereditary disease risk assessment.
GINA · GDPR Art.9 · HIPAA (when linked to care)

Data being shared with AI tools now is PII, PHI, proprietary source code, internal meeting notes and financial projections. This is not an outlier – it's the everyday work of a normal day that legacy DLP was never supposed to see.

DataStealth / LayerX Enterprise ChatGPT Security Guide, 2026
Category 4: Behavioral & Digital PII
5 types · GDPR / CCPA / ePrivacy scope

Behavioral PII is the type of data that most organizations leave out of their AI data governance programs – because it doesn't look like PII in the traditional sense. No name, no ID number. But behavioral data can identify and profile people more accurately than a name and it is in AI workflows all the time.

18Device identifiersHigh
MAC addresses, device UUIDs, IMEI numbers, cookie IDs, advertising IDs (IDFA/GAID). Unique identifiers that associate behaviour with a device – and thus with a person. Both GDPR and CCPA consider persistent device identifiers as personal data. They appear in customer analytics prompts and in mobile app debugging workflows.
How it enters AI: Customer analytics, fraud detection analysis, mobile app crash reporting.
GDPR · CCPA · ePrivacy Directive
19Location dataHigh
GPS location, IP-based location, cell tower triangulation, check-in. Precise location (within 200m) is considered sensitive under GDPR and CCPA. Location history can be used to determine home, workplace, hospital, church, etc. and thus to build religious, health and behavioral profiles. AI tools for fleet management, logistics and customer analytics are used for this.
How it enters AI: Logistics AI, customer behavior analysis, workforce management tools.
GDPR · CCPA · Sensitive Data Provisions
20Browsing and purchase historyHigh
Browsing history, search history, purchase history, streaming history. Sensitive personal information under CCPA. Purchase history may indicate medical condition, religion and political views. AI personalization and recommendation engines use this at scale – and often without explicit consent for use in AI training.
How it enters AI: Marketing personalization, customer lifetime value modeling, churn prediction.
CCPA Sensitive PI · GDPR · UK GDPR
21IP addressesMedium
IPv4 and IPv6 addresses. The GDPR Article 29 Working Party has stated that IP addresses are personal data in most cases. Dynamic IPs that are changed often may be low risk; static IPs assigned to residential connections are directly identifiable. Appear in access logs, security incident reports and fraud detection workflows to AI tools.
How it enters AI: Security log analysis, fraud investigation, network incident response.
GDPR (WP29 confirmed) · CCPA · ePrivacy
22Social media handles and activityMedium
Usernames, profile URLs, post histories, engagement data. Social media data is publicly available but is still personal data under the GDPR and CCPA – the fact that a person has posted publicly does not mean that their data subject rights are excluded. AI social listening and customer intelligence tools aggregate this data at scale and this is considered as processing under the GDPR regardless of the data's visibility.
How it enters AI: Social listening tools, customer intelligence, influencer analysis.
GDPR · CCPA · Platform API ToS
Category 5: Professional & Organizational PII
5 types · GDPR / Trade Secret / Employment Law scope

Professional PII is the category that companies are most likely to undercount – it doesn't look like personal data to a compliance team that thinks about people. Performance reviews identify people. HR records contain protected categories. Employment history is tied to protected class status. And when that data gets into AI tools, it creates regulatory and litigation risk that most legal teams have not mapped.

23Employee performance recordsHigh
Performance reviews, PIPs (performance improvement plans), disciplinary records, promotion decisions. Protected under employment law and GDPR as personal data of individuals in their professional capacity. HR teams are increasingly using AI to draft performance reviews – inputting individual employee performance data into external AI models without any data processing agreement.
How it enters AI: Performance review drafting, PIP documentation, promotion assessment.
GDPR · Employment law · State privacy laws
24API keys and credentialsCritical
API keys, OAuth tokens, private certificates, .env file contents, database passwords. Not PII but has the highest breach impact of any data type in this guide – a single leaked API key can give full access to enterprise systems. SpyCloud has found 23.8 million credentials leaked via AI tools in 2024 alone. Coding agents that read .env files or repository configs are the main exposure vector.
How it enters AI: Coding agents, GitHub Copilot, debug prompts with environment variables.
SOC 2 · PCI-DSS · CMMC · ISO 27001
25Legal correspondence and contractsHigh
Attorney-client privileged communications, contract terms, settlement agreements, litigation strategy documents. Legal privilege may be waived if privileged communications are disclosed to an unauthorized third party (including an external AI provider not under a BAA or DPA). Legal teams using AI for contract review are some of the highest-risk AI users.
How it enters AI: Contract review, legal research, litigation preparation, NDA drafting.
Attorney-client privilege · GDPR · State bar rules
26HR and recruitment recordsHigh
Candidate resumes, interview notes, rejection reason, background check, disability accommodations. Contains protected class information (age, race, religion, disability, pregnancy) that can lead to employment discrimination liability if misused. AI hiring tools that use this data to determine candidate fit are governed by EEOC guidance and NYC Local Law 144.
How it enters AI: AI candidate screening, interview preparation, offer letter drafting.
EEOC · ADA · NYC Local Law 144 · GDPR
27Customer relationship data (CRM)High
Customer names, deal stages, communication logs, account notes, support ticket content. Sales teams copy and paste CRM records into AI tools to draft, summarize and analyze. CRM data is a combination of name, email, phone, company and behavior history and is a compound PII type that is subject to multiple regulatory frameworks. GDPR data subject requests may include CRM data processed by AI tools under a data processor agreement that does not include the AI platform.
How it enters AI: Sales email drafting, deal analysis, customer success summaries, support workflows.
GDPR · CCPA · PIPEDA · Industry-specific

The Detection Gap: What Generic Tools Miss

Most PII detection tools do well on structured data in isolation (a standalone SSN in a clean text field) but are much worse on the way PII is actually found in AI prompts (embedded in email threads, mentioned in passing in meeting notes, concatenated with other text, in non-standard formats or in non-English languages).

PII detection accuracy by category — purpose-built vs. generic NLP (internal Polygraf benchmark)
SSN / Government ID (structured)
94% generic
94%
Credit card numbers
91% generic
91%
PHI in clinical notes (unstructured)
67% generic
67%
Salary figures in prose context
54% generic
54%
API keys and credentials
71% generic
71%
Location data in free text
48% generic
48%
Mental health references in HR notes
38% generic
38%
Legal privilege markers in email
31% generic
31%

Generic NLP detectors are trained on structured, sanitized benchmarks – not on the dirty unstructured text that employees actually paste into AI tools. The gap is largest on the categories that appear most in real AI prompts.

How Polygraf AI Covers All 27

Polygraf AI contextual PII detection layer was designed for AI input/output use case – not file scanning or structured data fields. It covers all 27 categories of this guide, including unstructured, embedded and contextual appearances that generic NLP will miss. Detection runs inline at the input and output boundary with sub-100ms latency and redaction is applied before the data is sent to the LLM or before the response is sent. No data leaves your environment. Every detection event is logged with category, confidence and disposition for compliance reporting.

Polygraf AI

PII Detection Across All 27 Categories — At the AI Boundary

Polygraf detects and redacts all 27 PII categories listed in this guide before they are ingested into your AI models or are seen in AI outputs – inline, sub-100ms latency, entirely in your security perimeter. Designed for the unstructured, embedded appearances that generic NLP misses.

Request a Demo →
Air-gap ready · HIPAA · SOC 2
Deploys in under an hour

NEWS & More

Insights & Updates from Polygraf.

Blog Posts

Learn what PII data is being exposed by AI tools and how to protect your data.

To learn more about Polygraf, please get in touch.

At Polygraf, we envision a future where AI augments human capabilities without compromising safety, privacy, or ethical standards. Trust in our commitment to building this future with you.

Products

thank you

Your download will start now.

Thank you!

Please provide information below and
we will send you a link to download the white paper.