On this page
TL;DR for the time-pressed
Clinical AI sits at the intersection of HIPAA Privacy Rule, HIPAA Security Rule, HHS Office for Civil Rights (OCR) enforcement guidance, the 2024 Section 1557 nondiscrimination final rule, and FDA Software as a Medical Device (SaMD) classification. The 2026 picture: any LLM that processes Protected Health Information (PHI) must operate under a Business Associate Agreement (BAA) and the full Security Rule control set, retrieval-augmented generation (RAG) systems carry their own de-identification analysis, embedding stores are themselves repositories of PHI in most architectures, and clinical decision support powered by AI is subject to Section 1557 nondiscrimination rules and (where it meets the device definition) FDA SaMD oversight. This playbook is the operational pattern Areebi's compliance team ships to healthcare customers. Updated 2026-05-20.
PHI in LLM systems: where it actually lives
The HIPAA Privacy Rule defines PHI as individually identifiable health information transmitted or maintained in any form (45 CFR 160.103). In an LLM system, PHI lives in more places than most architecture diagrams show:
- Prompts. A clinician asking "summarise the chart for patient Jane Doe, DOB 1962-04-11" puts PHI directly in the prompt. The 18 HIPAA identifiers (45 CFR 164.514(b)(2)) often appear in the natural-language prompt context, even when the application UI does not show them.
- Completions. The model's output can quote PHI verbatim from the prompt or from retrieved context. Output filtering is necessary but not sufficient.
- Embeddings. Vector representations of PHI-bearing documents are themselves PHI under most analyses, because the embeddings preserve identifiability through nearest-neighbour retrieval and model inversion.
- Retrieved context. Documents pulled into the prompt by a retrieval system. RAG over a clinical EHR system means the EHR is part of the LLM's PHI estate at runtime.
- Logs and analytics. Prompt/completion logs, usage analytics, error traces, model evaluation outputs.
- Fine-tuned weights. If fine-tuning data contained PHI, the resulting weights may contain memorised PHI.
Areebi's compliance team designs the control set to recognise every one of these as PHI by default, with redaction, tokenisation, retention limits, and access logging applied at each stage.
BAA requirements for LLM vendors
If the LLM vendor will create, receive, maintain, or transmit PHI on your behalf, they are a Business Associate, and you need a Business Associate Agreement (45 CFR 164.504(e)) before any PHI flows. The clauses that matter for LLM vendors specifically:
- Permitted uses and disclosures. The BAA must limit the vendor's use of PHI to the services described in the agreement. The default for LLM vendors must be "no use of PHI for model training or improvement" unless explicitly authorised.
- Safeguards. The vendor must implement appropriate safeguards under the Security Rule, including the administrative, physical, and technical safeguards in 45 CFR 164.308-312.
- Reporting. The vendor must report breaches and any uses or disclosures not permitted by the BAA. For LLM vendors, this includes incidents like prompt injection that caused PHI disclosure.
- Subcontractor flow-down. The vendor must impose BAA obligations on any subcontractor that processes PHI - including underlying model providers if the LLM vendor is a wrapper.
- Termination and return/destruction. On termination, PHI must be returned or destroyed; where destruction is infeasible, the BAA must extend.
- Model training carve-out. The single most important clinical-AI clause: explicit prohibition on training, fine-tuning, or product-improvement use of PHI in any form (including embeddings).
The trap to avoid: assuming the LLM vendor's standard enterprise terms cover BAA obligations. They almost never do without an attached BAA, and most LLM vendor standard terms include training-on-customer-data clauses that are incompatible with HIPAA. The Areebi AI vendor list for CFOs includes the BAA checklist.
PHI in retrieval-augmented generation
Retrieval-augmented generation (RAG) is the dominant pattern for clinical LLM applications, and it is the pattern most likely to leak PHI inappropriately. The HIPAA-aligned RAG architecture has six controls that Areebi customers wire by default:
- Authorised retrieval. The retrieval query carries the authenticated user's identity, and the retrieval system enforces access controls equivalent to those on the source system. A clinician cannot retrieve a record their EHR access policy would not permit.
- Minimum necessary. Per 45 CFR 164.502(b), only the minimum PHI necessary for the purpose is retrieved. The retrieval system limits the number of chunks, the size of each chunk, and the fields exposed.
- De-identification where feasible. Where the use case allows, retrieve de-identified data and only re-identify on demonstrated need.
- Prompt instrumentation. The PHI in the prompt is tagged, the retrieval source is recorded, and the audit log captures the full prompt context.
- Completion filtering. The completion is scanned for PHI that should not have been disclosed, with the configured handling (block, redact, escalate).
- Retention limits. Prompts and completions are retained only as long as necessary for operational purposes, with automated purging on the configured cadence.
The 2024 OCR guidance on online tracking technologies (originally December 2022, updated March 2024) sharpened the focus on inadvertent PHI disclosure via analytics. The Areebi policy engine enforces a default-deny on third-party analytics that would collect PHI from clinical interfaces.
De-identification under Safe Harbor and Expert Determination
HIPAA provides two de-identification methods under 45 CFR 164.514(b): Safe Harbor and Expert Determination. Both apply to LLM systems, but they apply differently to embeddings than to source documents.
- Safe Harbor (45 CFR 164.514(b)(2)). Removes the 18 listed identifiers (names, geographic subdivisions smaller than state, dates more specific than year, telephone numbers, fax numbers, email addresses, SSNs, MRNs, account numbers, certificate/licence numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifying number, characteristic, or code) plus the requirement that the covered entity has no actual knowledge that the remaining information could be used to identify the individual.
- Expert Determination (45 CFR 164.514(b)(1)). A statistical or scientific expert determines that the risk of re-identification is very small, applying generally accepted statistical and scientific principles and methods. The expert must document the analysis.
The hard question: are embeddings de-identified? The default answer in 2026 is no. Embeddings of PHI-bearing text preserve identifiability through model inversion and nearest-neighbour retrieval (see the model inversion literature on text embeddings, including the Morris et al 2024 work). Areebi's compliance team treats embedding stores as PHI repositories by default and applies Security Rule controls accordingly, unless an Expert Determination has been performed on the specific embedding pipeline.
See Areebi in action
Get a 30-minute personalised demo tailored to your industry, team size, and compliance requirements.
Get a DemoSection 1557 and clinical decision support
HHS finalised the Section 1557 nondiscrimination rule in May 2024 (89 FR 37522), and for the first time it explicitly addresses patient-care decision support tools, including AI. Per the final rule (45 CFR 92.210), covered entities must not discriminate on the basis of race, color, national origin, sex, age, or disability through the use of patient-care decision support tools. The covered entity is responsible for identifying and mitigating discrimination risks in any decision support tool used in care delivery.
The operational implications for clinical AI:
- Inventory of decision support tools. The covered entity must inventory the AI tools used in care decisions. The healthcare AI governance guide walks through the inventory pattern.
- Identification of discrimination risk. For each tool, an analysis of whether and how the tool could produce discriminatory outcomes. This is not optional and not delegable to the vendor.
- Mitigation. Documented mitigations - training data audits, fairness testing, human review, escalation triggers.
- Documentation. Evidence the covered entity can produce on OCR examination demonstrating the inventory, the risk analysis, and the mitigations.
Section 1557 enforcement does not require a finding of intentional discrimination. The disparate impact standard applies. Areebi's policy engine ships a Section 1557 evaluation pack that scores deployed clinical AI tools against the rule's expectations.
FDA SaMD classification for clinical LLMs
A clinical LLM may meet the FDA definition of a medical device (Section 201(h) of the Food, Drug, and Cosmetic Act). The Software as a Medical Device (SaMD) framework, adopted from IMDRF/SaMD WG/N12 (2014) and codified in FDA guidance, classifies SaMD by the seriousness of the healthcare situation and the information the SaMD provides:
- Class I. Inform clinical management for non-serious situations.
- Class II. Drive clinical management for non-serious situations, or inform for serious situations.
- Class III. Drive clinical management for serious situations.
- Class IV. Treat or diagnose serious or critical situations.
A clinical LLM that summarises a patient chart for a clinician to review may sit in Class I. An LLM that prioritises ED triage may sit in Class II or higher. An LLM that recommends specific diagnostic codes or treatment plans may be Class III. The classification drives the level of FDA oversight - 510(k) clearance, De Novo, or PMA - and the post-market surveillance requirements under FDA's Predetermined Change Control Plan guidance (March 2024).
The pre-emptive question: does Clinical Decision Support (CDS) qualify for the 21st Century Cures Act CDS exclusion? Per FDA's CDS final guidance (September 2022), software qualifies as non-device CDS if it (1) does not acquire signals from in vitro diagnostic devices or signal-acquisition devices, (2) displays, analyses, or prints medical information or non-image patient information, (3) is intended for the purpose of supporting or providing recommendations to a clinician about prevention, diagnosis, or treatment, and (4) is intended to enable the clinician to independently review the basis for the recommendations. Most clinical LLMs fail criterion 4 because the basis for the recommendation is opaque - the model's reasoning cannot be independently reviewed in the FDA sense.
HIPAA Security Rule applied to LLM systems
The HIPAA Security Rule (45 CFR 164.302-318) sets administrative, physical, and technical safeguards for electronic PHI. The 2024 NPRM and the HHS 2025 update to the Security Rule (the first material update in over 20 years) tighten technical safeguards on encryption, MFA, and audit logging. The LLM-specific controls:
- 164.308(a)(1) Security management process. Risk analysis must include AI-specific risks: prompt injection, training-data exfiltration, model inversion, hallucination causing patient harm.
- 164.308(a)(3) Workforce security. Provisioning and de-provisioning for AI tools follows the same identity lifecycle as the EHR; shadow AI is a Security Rule failure (see our 90-minute shadow AI hunt).
- 164.308(a)(4) Information access management. Per-user, per-collection, per-prompt-class access control. Minimum necessary applies.
- 164.308(a)(6) Security incident procedures. Incident response includes AI-specific incidents.
- 164.312(a) Access control. Authenticated access to LLM endpoints; the Areebi policy engine refuses unauthenticated requests.
- 164.312(b) Audit controls. The audit log must capture access to PHI in prompts, completions, and embeddings.
- 164.312(c) Integrity. PHI in prompts and completions must not be altered or destroyed in an unauthorised manner.
- 164.312(d) Person or entity authentication. Strong authentication including MFA.
- 164.312(e) Transmission security. Encryption in transit; key management.
The Areebi HIPAA compliance hub walks through the full mapping.
OCR online tracking technologies guidance
The HHS Office for Civil Rights (OCR) issued guidance on online tracking technologies in December 2022 and updated it in March 2024. The guidance addresses how covered entities' use of third-party tracking technologies (cookies, pixels, analytics SDKs) on user-authenticated and unauthenticated webpages can result in impermissible disclosures of PHI.
The 2024 update narrowed some of the original guidance after litigation but retained the core position: tracking technologies on authenticated patient portals and on unauthenticated pages that capture identifiable health information are covered, and disclosures to third parties without a BAA are impermissible. For clinical AI interfaces, this means:
- No third-party analytics on patient-facing AI tools without a BAA. If you use a tracking pixel from a vendor that does not sign a BAA, the deployment is non-compliant.
- Server-side analytics where possible. The Areebi platform supports server-side telemetry that does not transmit PHI to third-party analytics services.
- Patient notice. Where authorised tracking is in use, the privacy notice must address it.
Areebi's policy engine ships a default-deny on third-party analytics in clinical contexts, with explicit allow-list for BAA-covered vendors.
The clinical-AI operational playbook
The translation from regulation to control is what most healthcare CISOs struggle with. The pattern Areebi's compliance team uses with healthcare customers:
| Phase | Activity | Areebi platform support |
|---|---|---|
| Inventory | Identify every AI use case touching PHI, including shadow AI. Map to BAA status, retrieval sources, model versions. | Shadow AI discovery, vendor registry, policy engine. |
| BAA | For each vendor in scope, confirm BAA in place, training carve-out language, sub-processor flow-down. | Vendor registry with BAA status tracking, refusal to enable vendor without BAA. |
| Architecture | Wire prompts, retrievals, completions, embeddings into the Areebi control plane. Apply minimum-necessary retrieval, output filtering, retention limits. | Policy engine enforces controls at runtime; audit log records every decision. |
| Section 1557 | For each clinical decision support tool, perform the discrimination risk analysis. Document mitigations. | Section 1557 evaluation pack, fairness testing harness. |
| FDA | For each clinical AI tool, classify against SaMD. If a device, plan regulatory pathway. | Areebi customers maintain the classification record alongside the vendor record. |
| Audit readiness | Prepare evidence for OCR examination, internal audit, SOC 2 (see our SOC 2 + AI workloads mapping). | Audit log exports, evidence pack templates. |
Areebi's point of view
Clinical AI is the highest-stakes deployment context for an LLM, and the regulation reflects that. Areebi's policy engine treats PHI as a first-class data class and enforces the HIPAA control set by default - no PHI to a vendor without a BAA, no PHI to a third-party analytics service, no embedding store without Security Rule controls, no clinical decision support without a Section 1557 analysis on file. The shortcut most teams take - "we will figure out compliance once we have a clinical pilot live" - is the wrong order of operations for any context but especially this one. OCR examinations are increasing in frequency; AI is now in their examination pattern. Build the control set before you need it.
Frequently Asked Questions
Do I need a BAA with my LLM vendor?
Yes if the vendor will create, receive, maintain, or transmit PHI on your behalf. Most LLM vendor standard terms include training-on-customer-data clauses that are incompatible with HIPAA, so a BAA with an explicit training carve-out is mandatory. The Areebi vendor registry refuses to enable an LLM vendor for PHI workloads without a signed BAA on file.
Are embeddings of PHI considered de-identified?
No, not by default. Embeddings preserve identifiability through model inversion and nearest-neighbour retrieval. Treat embedding stores as PHI repositories under the Security Rule, unless an Expert Determination has been performed on the specific embedding pipeline showing the re-identification risk is very small.
Does Section 1557 apply to a clinical LLM I purchased from a vendor?
Yes. Section 1557's nondiscrimination obligations on covered entities are not delegable to vendors. The covered entity is responsible for identifying and mitigating discrimination risks in any decision support tool used in care delivery, whether built in-house or purchased. The vendor's compliance posture is relevant but does not discharge the covered entity's duty.
When does a clinical LLM become an FDA device?
When it meets the Section 201(h) definition of a medical device. Most clinical LLMs that drive clinical management for serious situations or that fail the 21st Century Cures Act CDS exclusion (the independent-review criterion is the common failure point) will be devices. The classification (Class I-IV under the SaMD framework) drives the FDA pathway.
Can I use a non-BAA LLM for de-identified data?
If the data is properly de-identified under Safe Harbor or Expert Determination, it is no longer PHI and HIPAA does not apply. But three caveats: (1) the de-identification must be properly done, (2) any re-identification creates PHI immediately, and (3) state privacy laws may still apply to de-identified data. The Areebi platform supports separate environments for PHI and non-PHI workloads.
What is the most common HIPAA AI control failure?
Sending PHI to a model provider without a BAA, or under a BAA that does not include a training carve-out. The second most common: inadvertent PHI disclosure to third-party analytics via tracking pixels on clinical UIs, addressed by the OCR online tracking technologies guidance.
Related Resources
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and governance best practices.
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and best practices.
About the Author
Areebi Research
The Areebi research team combines hands-on enterprise security work with deep AI governance research. Our analysis is informed by primary sources (NIST, ISO, OECD, federal registers, IAPP) and the operational realities of CISOs running AI programs in regulated industries today.
Ready to govern your AI?
See how Areebi can help your organization adopt AI securely and compliantly.