On this page
TL;DR for the time-pressed
Fine-tuning, retrieval-augmented generation (RAG), and prompt engineering are three different ways to make a foundation model do useful work in your domain - and they have profoundly different compliance profiles. Fine-tuning embeds your data in the weights and inherits years of regulator scepticism about training-data lawful basis, retention, and right-to-erasure. RAG embeds your data in a retrieval store accessed at inference time, with cleaner audit, residency, and erasure properties but more runtime control burden. Prompt engineering keeps your data in the prompt itself, with the lowest setup cost but the most fragile guarantees. This post is the trade-off matrix Areebi's compliance team builds with customers - including the contexts (healthcare, financial services, EU, US public sector) in which each approach wins. Updated 2026-05-20.
What each approach actually is (and why the difference matters)
The three approaches are often conflated in vendor marketing; the regulatory consequences are different in every dimension that matters.
- Prompt engineering. The model is unchanged. Your data and instructions enter the model only at the time of the request, in the prompt. The model sees them, generates a completion, and (under most enterprise vendor terms) does not retain them after the session. Examples: system prompts, few-shot examples, structured prompt templates, in-context examples.
- Retrieval-augmented generation (RAG). The model is unchanged. Your data is indexed into a retrieval store (typically a vector database plus a metadata index). At inference time, relevant chunks are retrieved and inserted into the prompt. The model sees the retrieved context; the retrieval store is yours to control. Examples: enterprise document search, customer-support copilots, clinical-question RAG over EHR documents.
- Fine-tuning. The model itself is updated using your data. The weights now encode learned patterns from your training corpus. Examples: domain-specific tone fine-tuning, instruction-following fine-tuning, capability extension via SFT or RLHF / RLAIF, parameter-efficient fine-tuning (LoRA, QLoRA, prefix tuning) where only adapter weights change.
The compliance consequences map directly to where your data sits:
| Approach | Where the data lives | Erasure cost | Audit clarity | Drift surface |
|---|---|---|---|---|
| Prompt engineering | Session-bounded prompt + logs you control | Lowest (purge logs) | Highest (every prompt and completion is visible) | Lowest (model unchanged) |
| RAG | Retrieval store (your infrastructure) | Low to medium (delete from store; logs) | High (retrieval store and prompts both auditable) | Low (model unchanged; retrieval quality drifts) |
| Fine-tuning | Model weights (yours if open, vendor's if proprietary) | High (retraining or unlearning) | Lower (data is embedded; harder to trace) | High (model changes; behaviour can shift) |
The Areebi AI control plane primer and the policy engine primer describe how the control plane abstracts across these.
Data residency exposure
Where your data ends up - geographically and contractually - differs dramatically by approach.
- Prompt engineering. Prompts and completions traverse the model vendor's inference infrastructure. Residency is a property of the vendor's region selection and the SKU. OpenAI Enterprise, Anthropic Claude for Work, and Google Vertex AI all support regional pinning in 2026; the consumer / developer tiers are more limited.
- RAG. The retrieval store is in your infrastructure. The model still sees the retrieved context, so the prompt's residency story is the same as prompt engineering. But you can apply pre-retrieval redaction, pseudonymisation, and access control - the data the model sees can be the minimum necessary for the task.
- Fine-tuning. The training corpus is processed by the training infrastructure. For proprietary vendors that train on your behalf (OpenAI fine-tuning, Anthropic fine-tuning programmes, Google Vertex AI tuning), the training corpus is sent to the vendor for the duration of training. The resulting weights live in the vendor's infrastructure for proprietary models, or in yours for open-weight self-hosted fine-tunes.
The regulator pressure point: the EU AI Act, the UK Data Protection Act, and the various US state laws all care about where personal data is processed and stored. The EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in AI models specifically addresses the lawful basis, the residency, and the post-deployment data-subject-rights expectations. The Areebi EDPB Opinion 28/2024 playbook walks the controls.
Retraining liability under the EU AI Act
This is the clause most enterprise teams under-weight in 2026. The EU AI Act distinguishes between providers and deployers. The provider designs or has the AI system designed and places it on the market under their name; the deployer uses an AI system under their authority. Different obligations apply.
Articles 25 and 28 (and the General-Purpose AI provisions in Articles 51-55) address what happens when a deployer substantially modifies a system or a model:
- Prompt engineering. Does not modify the model. The deployer remains a deployer. Provider obligations stay with the foundation model provider.
- RAG. Does not modify the model. The deployer remains a deployer for the underlying model. The RAG application itself is an AI system that the deployer is also (in most cases) the provider of, with the relevant high-risk system or limited-risk system obligations.
- Fine-tuning. May constitute substantial modification. Per the European Commission's GPAI guidance and the implementing acts, a substantial fine-tune of a general-purpose AI model can make the fine-tuner a GPAI provider with the corresponding Article 53 obligations (technical documentation, training-data summary, copyright policy, and where systemic-risk thresholds apply, additional risk-management and incident-reporting). The threshold for "substantial" is unclear at the edges; light task-specific fine-tuning typically does not cross it, but heavier retraining or capability expansion likely does.
The practical implication: fine-tuning an open-weight model on enterprise data may quietly transfer provider obligations to your organisation. Most compliance teams discover this in audit. The EU AI Act for mid-market guide explains the deployer-provider boundary.
GDPR right to erasure and the unlearning problem
GDPR Article 17 gives data subjects a right to erasure ("right to be forgotten") in defined circumstances. Implementing it differs by approach:
- Prompt engineering. Erasure means deleting the prompt and completion logs containing the data subject's personal data. Low cost if your retention policy and audit log support per-subject queries.
- RAG. Erasure means deleting from the retrieval store and the logs. Vector databases vary in their support for per-vector deletion and re-indexing; modern enterprise vector stores (pgvector, Qdrant, Weaviate, Pinecone) all support deletion, but propagating the deletion across all replicas and search indexes is a non-trivial operational pattern. The Areebi platform automates the per-subject deletion path across RAG components.
- Fine-tuning. Erasure is the hard problem. Once data is in the model's weights, removing it requires either retraining without the data, applying an unlearning technique (active research, not yet reliable at scale), or accepting that the data is retained until the next retrain. The UK ICO guidance on AI and data protection, the EDPB Opinion 28/2024, and the Article 29 Working Party's historical opinions all acknowledge this is a hard technical problem - but they do not waive the right. The cost is in your court.
The Areebi compliance team's rule of thumb for healthcare and EU public-sector customers: do not put personal data into fine-tuning corpora that is not either (a) properly pseudonymised under EDPB guidance, (b) subject to a lawful basis other than consent (because consent can be withdrawn), or (c) part of a documented retention pattern that will retrain on schedule.
Audit trail completeness
ISO 42001 control 9.5, SOC 2 CC7.2 and CC8.1, and the NIST AI RMF Manage function all require an evidentiary trail from training data through model decisions to outputs. The completeness of that trail varies sharply.
- Prompt engineering. The audit trail is the prompt, the system prompt, the completion, the policy engine decisions, and the model version. Highest completeness; lowest reconstruction cost.
- RAG. Same as prompt engineering, plus the retrieval chunks, the retrieval source documents, the embedding model version, and the retrieval ranking. The Areebi audit log captures all of these in a single record.
- Fine-tuning. The audit trail must include the training corpus manifest (which documents, which versions, which fields, which transformations), the training run metadata (model version, hyperparameters, hardware, time, environment), the evaluation suite output, and the deployment record. The challenge: the model's behaviour at runtime is partly a function of the training corpus - but the corpus is not visible at inference time. The evidentiary chain must be reconstructed from training records, which often live in MLOps tooling separate from the runtime audit log. The AIBOM playbook describes the lineage pattern.
The auditor's question is the same in each case: "show me how the model arrived at this output, and what data influenced it." Fine-tuning makes the answer "the training corpus, indirectly, plus the prompt, directly." Prompt and RAG make the answer easier to give.
Get your free AI Risk Score
Take our 2-minute assessment and get a personalised AI governance readiness report with specific recommendations for your organisation.
Start Free AssessmentModel drift surface area
Drift is the silent compliance failure. Three flavours, with different relationships to the approach:
- Concept drift. The world changes; the data the model was trained on becomes less representative. Prompt engineering and RAG let you keep the knowledge fresh via the retrieval store. Fine-tuning needs retraining.
- Data drift. Inputs the model sees in production differ from training distribution. All three approaches are exposed; RAG and prompt engineering can be retuned cheaply, fine-tuning needs retraining or new adapters.
- Model drift. The model itself changes - vendor pushes a new version, fine-tuning shifts behaviour, parameter changes. Most acute with vendor-managed models and fine-tunes; mitigated by version pinning. The Areebi vendor registry tracks model version and the change-management workflow.
The 2024 NIST AI 600-1 Generative AI Profile lists confabulation (hallucination) and information-integrity risks as material; the controls in the Manage function (MS) include continuous monitoring with drift-detection. Areebi's AI agent monitoring and observability guide goes deeper on metric design.
Cost-per-inference and total-cost-of-ownership
The unit economics differ in ways that affect the compliance choice.
- Prompt engineering. Lowest setup cost (just write better prompts). Per-inference cost is the model's standard rate; longer prompts (more context) cost more.
- RAG. Setup cost is the retrieval store, the embedding pipeline, and the chunking strategy. Per-inference cost adds embedding generation for the query (small) and longer prompts to the model (the retrieved context). Operational cost is the retrieval store running 24x7.
- Fine-tuning. Setup cost is the training run (compute + engineering time). Per-inference cost can be lower (less context in the prompt because the knowledge is in the weights). Operational cost includes retraining cadence and the AIBOM lineage maintenance.
The compliance angle: cheaper inference does not mean cheaper compliance. A fine-tune that saves $0.001 per inference but increases the annualised cost of erasure responses, audit reconstruction, and EU AI Act provider documentation by $250,000 has not saved money.
Which approach wins in which regulatory context
The Areebi framework: pick the approach that meets the regulatory expectation in your sector, then add capability via the others where the trade-off is acceptable.
| Context | Default winner | Why |
|---|---|---|
| Healthcare (HIPAA, OCR, FDA SaMD) | RAG with strong access control | PHI retrieval can be controlled per-user; embeddings of PHI are themselves PHI but live in your store; erasure path is implementable; FDA SaMD validation simpler when knowledge lives in the retrieval store rather than the weights. Fine-tuning on PHI is rarely worth the BAA / audit cost. |
| EU public sector (EU AI Act, GDPR, EDPB Opinion 28/2024) | Prompt engineering + RAG with EU-hosted retrieval | Avoids provider re-attribution under EU AI Act Articles 25 / 28. Erasure path is implementable. EU sovereign hosting of retrieval store. Self-hosted open-weight model if data class requires. |
| Financial services (DORA, MiFID II, sector regulator) | RAG with strict audit + version-pinned model | Auditability and reproducibility are non-negotiable; RAG makes the data lineage explicit. DORA major-ICT-incident reporting needs reconstructable behaviour. Fine-tuning can be used for tone / language but rarely for substantive decision logic. |
| US federal / DoD (FedRAMP, FISMA, NIST SP 800-53) | RAG inside the authorised boundary | The authorisation boundary contains the retrieval store. Model can be a FedRAMP-authorised managed service or a self-hosted open weight. Fine-tuning requires explicit assessment of training-data residency. |
| Customer-support / sales tooling (low-risk, high-volume) | Prompt engineering + RAG hybrid | Cost-effective; auditable; safety controls via policy engine. |
| Code generation / developer tooling | Prompt engineering with code-context RAG | IP exposure is the dominant risk; vendor-side training carve-outs and code-context retrieval suffice. |
| Domain-specific style / format (legal drafting, marketing tone) | Fine-tuning is reasonable | Style does not raise erasure or residency concerns; fine-tuning a small open-weight model gives reproducibility. |
| Manufacturing / IP-sensitive domain knowledge | RAG with strict access control | Trade-secret data should not enter vendor training pipelines. RAG keeps it in your store. The Areebi manufacturing AI trade-secret protection guide covers the pattern. |
The Areebi decision rule (and the questions to ask)
The default sequence Areebi customers run.
- Start with prompt engineering. If a well-engineered prompt template plus a good system prompt accomplishes the goal, that is the lowest-compliance-burden path. Most "we need fine-tuning" requests are solvable with prompt engineering.
- Add RAG when knowledge is needed. When the model needs domain knowledge it does not have (your product catalogue, your policy library, the patient's EHR), RAG is the right answer. The retrieval store is yours; the model stays unchanged.
- Fine-tune when style, format, or behavioural shaping is needed and prompt engineering does not suffice. Even then, prefer parameter-efficient methods (LoRA, QLoRA, adapter tuning) over full fine-tuning, and prefer self-hosted open-weight base models so the audit trail and lineage stay in your control. Avoid fine-tuning on personal data unless you have a documented lawful basis, a retention plan, and a retraining cadence that supports erasure.
The questions every approach review should ask:
- What data classes will be in the training corpus / retrieval store / prompts? PHI? PII? Trade secrets? Regulated financial data?
- What lawful basis (GDPR Article 6, where applicable)? Is consent withdrawable? Is contract performance plausible?
- What is the erasure path? Documented and tested?
- Does this approach make us a GPAI provider under EU AI Act Article 25 / 28?
- Where does the data physically reside, at rest and in transit?
- What is the audit trail completeness for an OCR / DPA / SOC 2 auditor question 12 months from now?
- What is the drift detection and remediation plan?
- What is the cost of getting it wrong - regulator, customer, board?
The AI vendor risk score tool captures this trade-off across vendors; the AI framework comparison tool maps it to compliance frameworks.
Vendor fine-tuning policies you should read carefully
If you are fine-tuning with a proprietary vendor, the policy fine print determines what you can build and what evidence you can produce.
- OpenAI. Fine-tuning programmes for selected GPT-4o-class models. Usage Policies apply; training-on-customer-data is opt-in for fine-tuning. The fine-tuned model is yours under your account; the underlying weights are not. Standard DPA applies for personal data processing.
- Anthropic. Fine-tuning historically more conservative; Acceptable Use Policy applies. Where fine-tuning is offered, the data residency, retention, and training carve-outs follow the Claude for Work and API DPA.
- Google Vertex AI. Tuning available for Gemini and PaLM-family models; data residency follows Google Cloud regions. Customer-managed encryption keys available. The Generative AI Additional Terms govern.
- AWS Bedrock. Tuning for selected supported models; data stays in the AWS region of the customer's choice; AWS does not use customer data to train base models per the Bedrock service terms.
- Azure OpenAI. Fine-tuning available; data stays in the customer's Azure region; Microsoft's Responsible AI standard applies in addition to OpenAI policies.
- Open-weight self-hosted. No vendor policy; your own engineering, security, and compliance teams own the trade-off in full. The open vs proprietary LLM governance comparison covers the topology choice.
Areebi's point of view
Fine-tuning is the highest-compliance-burden tool in the kit, and it is rarely the right first choice. Areebi's research team's view: start with the lowest-burden approach that meets the requirement, escalate only when the requirement justifies the cost, and never let a model engineer choose fine-tuning over RAG without a Privacy / Legal review that has read the EDPB Opinion 28/2024 and the EU AI Act Articles 25 and 28. RAG is the workhorse of 2026 enterprise AI; prompt engineering is the underrated lightweight. Fine-tuning is the specialist tool, used sparingly.
Frequently Asked Questions
Is RAG always more compliant than fine-tuning?
Almost always, yes - for personal data, regulated health data, and trade-secret data. RAG keeps the data in a store you control with a deletion path; fine-tuning embeds it in weights with a difficult erasure path. The exception is non-personal style or format shaping, where fine-tuning a small open-weight model is reasonable.
Does fine-tuning a model on personal data violate GDPR?
Not automatically, but the lawful basis, the retention pattern, and the erasure path all have to be documented. The EDPB Opinion 28/2024 sets out the data-protection expectations. If the fine-tuning corpus contained personal data under consent and the data subject withdraws consent, the retraining path must support erasure within the timeline GDPR Article 17 implies. Many fine-tuning programmes cannot do this without a planned retrain.
If I fine-tune Llama 3 on enterprise data, do I become an EU AI Act provider?
Possibly. Article 25 / 28 of the EU AI Act addresses substantial modification; the European Commission's GPAI guidance and Code of Practice operationalise the threshold. A heavy retrain or capability expansion likely transfers provider obligations to you; a light task-specific fine-tune likely does not. The Areebi compliance team's rule is to assume provider obligations for any fine-tune that materially changes capability and to document the decision either way.
Are embeddings in a RAG store personal data?
Often yes. Embeddings of personal data preserve identifiability via nearest-neighbour retrieval and model inversion - the Morris et al text-embedding inversion work and the broader literature establish this. The Areebi compliance team treats embedding stores as personal-data repositories by default unless an Expert Determination or equivalent analysis has been performed on the specific pipeline.
Can I use prompt engineering to handle highly sensitive data without DPA / BAA?
No. The data still leaves your boundary into the model vendor's inference infrastructure. You still need the contractual and technical controls. Prompt engineering reduces the surface area for data exposure but does not eliminate the need for vendor-side contracts.
How do I detect drift in a fine-tuned model?
Continuous evaluation against a held-out evaluation set with calibrated control limits, plus production telemetry on user feedback, refusal rates, output classifier scores, and downstream task metrics. The Areebi AI agent monitoring and observability guide goes deeper. Frequency: at least daily for production workloads; per-deploy for any change.
Related Resources
- AI Control Plane (definition)
- AI Policy Engine (definition)
- AI Audit (definition)
- AI Observability (definition)
- Data Residency for AI
- EU AI Act Compliance Hub
- GDPR Compliance Hub
- HIPAA Compliance Hub
- Areebi Platform
- Trust Center
- AI Vendor Risk Score (tool)
- AI Framework Comparison (tool)
- EDPB Opinion 28/2024 Playbook
- AIBOM Playbook
- EU AI Act for Mid-Market
- Manufacturing AI Trade Secret Protection
- Open Source vs Proprietary LLM Governance
- AI Agent Monitoring and Observability
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and governance best practices.
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and best practices.
About the Author
Areebi Research
The Areebi research team combines hands-on enterprise security work with deep AI governance research. Our analysis is informed by primary sources (NIST, ISO, OECD, federal registers, IAPP) and the operational realities of CISOs running AI programs in regulated industries today.
Ready to govern your AI?
See how Areebi can help your organization adopt AI securely and compliantly.