On this page
TL;DR for the time-pressed
On 17 December 2024 the European Data Protection Board adopted Opinion 28/2024 on the processing of personal data in the context of AI models. It does three load-bearing things: (1) sets a strict, case-by-case anonymity test for trained AI models (most large language models will not pass it by default), (2) anchors any legitimate interest claim for training on a three-step necessity and balancing test that demands documented mitigations, and (3) splits the GDPR analysis cleanly between the development (training) phase and the deployment (inference) phase, treating them as distinct processing operations with distinct lawful bases. If a model is found to have been trained unlawfully, that unlawfulness can contaminate downstream deployment unless specific remediation steps are taken. This playbook is what Areebi's compliance team uses to operationalise the Opinion across the LLM lifecycle. Updated 2026-05-20.
What Opinion 28/2024 actually says (and what it does not)
The Irish Data Protection Commission asked the EDPB four questions under Article 64(2) GDPR. The Opinion answers them, but its real force is in the reasoning. CISOs reading the Opinion for the first time tend to extract two things and miss the third. The two that get extracted: the anonymity test and the three-stage legitimate interest test. The third, which is the most operationally consequential, is the doctrine of cross-phase contamination - if training is unlawful, deployment can be too.
Source: EDPB Opinion 28/2024, adopted 17 December 2024. The Opinion is not binding on national supervisory authorities in the strict sense, but Article 64(2) opinions are routinely treated as the operative GDPR interpretation in enforcement and audit contexts. France's CNIL, Germany's BfDI, Italy's Garante, and the UK's ICO (post-Brexit, with the Data Protection Act 2018 substantially mirroring GDPR) have all aligned guidance with the Opinion within the first quarter of 2026.
The anonymity test for trained AI models
The EDPB's starting position is that a trained AI model is rarely anonymous by default. A model trained on personal data may still process personal data through extraction (memorised training examples), regurgitation (verbatim or near-verbatim reproduction), or inference (re-identification through prompt engineering, vector similarity, or model inversion). The Opinion is explicit: claims of anonymity must be assessed on a model-by-model basis using the Article 29 Working Party WP216 standard of "means reasonably likely to be used" by the controller or any third party.
Practically, this means LLM developers cannot rely on a blanket "the model is anonymous" defence. They must document an anonymity assessment that demonstrates the likelihood of extraction and inference is negligible. The Opinion provides a non-exhaustive list of evidentiary elements: training-data composition, de-duplication, differential privacy, output filtering, red-team testing for memorisation, and ongoing monitoring. Areebi's policy engine ships pre-built rules that align with each of these elements, so deployment-stage controllers can attach the developer's anonymity evidence to their own deployment DPIA.
The three-stage legitimate interest test
Opinion 28/2024 confirms that legitimate interest under GDPR Article 6(1)(f) can, in principle, be a lawful basis for training AI models on personal data. But it tightens the test substantially. The familiar three-stage analysis (purpose test, necessity test, balancing test) gets AI-specific bite:
- Purpose test: the interest must be lawful, clearly articulated, and real and present (not speculative). "Building a foundation model" is too abstract; "training a customer-service triage model for our regulated business" is concrete enough.
- Necessity test: the training must be necessary - not merely useful. The Opinion is explicit that data minimisation under GDPR Article 5(1)(c) applies to training-data composition. Controllers must show why less data, synthetic data, or licensed data could not have achieved the same purpose.
- Balancing test: the interests of the controller must not be overridden by the interests, rights, and freedoms of data subjects. The Opinion lists factors that tilt the balance against the controller, including web-scraped data, sensitive categories, vulnerable populations, and the irreversibility of training (you cannot un-train a model on demand).
The Areebi platform stores a structured legitimate interest assessment (LIA) per AI use case, so each model-training run carries a documented, version-controlled LIA that survives examination.
Cross-phase contamination: the operational risk most teams miss
The Opinion's most under-discussed paragraph addresses what happens when a model is trained unlawfully and then deployed. The EDPB's answer: the unlawfulness of the development phase can contaminate the deployment phase, in particular where the deploying controller knew or should have known that training was unlawful. This is functionally a downstream due-diligence duty.
The mitigations the Opinion lists - retraining with lawful data, applying differential privacy or output filtering to the deployed model, demonstrating that the model no longer reproduces personal data - are not free. Deploying controllers should treat this as a contractual obligation on AI vendors: model providers must warrant the training process and provide evidence sufficient to support the deploying controller's accountability obligations under GDPR Article 5(2). Areebi's AI vendor due diligence framework includes the specific clauses CISOs should require.
The model-training vs deployment Article 6 split
Opinion 28/2024 treats training and deployment as separate processing operations with distinct lawful bases. This is the single most important conceptual shift for compliance teams that have so far treated "the AI system" as one undifferentiated thing. Different bases apply to different phases, and the Opinion's discussion of which bases are plausible is the framework CISOs should anchor to. The table below is the version Areebi's compliance team uses with regulated customers.
| Processing phase | Plausible Article 6 bases | Notes from Opinion 28/2024 |
|---|---|---|
| Model training (curating, cleaning, training, evaluating) | (a) consent, (b) contract, (c) legal obligation, (f) legitimate interest | Consent is rarely feasible at training scale; legitimate interest is the practical default but subject to the three-stage test. |
| Fine-tuning on first-party data | Typically (b) contract or (f) legitimate interest; (a) consent where data subjects expect it | Necessity test is tighter when first-party data is available - controllers must minimise training data to what is needed. |
| Deployment - inference on user prompt data | (b) contract, (a) consent, (f) legitimate interest depending on use case | Where inference involves human-impacting decisions, Article 22 GDPR safeguards must be added on top of the Article 6 basis. |
| Deployment - training feedback loop (logging prompts and completions for retraining) | Re-evaluate as a distinct purpose under (a) or (f) | Opinion 28/2024 notes purpose limitation under Article 5(1)(b): training reuse of deployment data is a new purpose and needs a fresh basis. |
The trap to avoid: treating the deployment Article 6 basis as inherited from the training basis. They are independent and must be independently documented. Areebi's audit log records the basis claimed for each processing operation, with the policy engine refusing to process when the basis is missing or expired.
The legitimate interest assessment template for AI
An LIA is a written document; if you cannot produce it on demand you do not have a legitimate interest basis. The template below is what Areebi's compliance team recommends as the baseline; many regulated customers extend it with sector overlays. It maps to the EDPB Opinion three-stage test and to ICO's AI and data protection guidance (updated 2025) and to CNIL's AI How-To Sheets (Series 2, 2024-2025).
- Section 1 - Purpose test. What is the specific business purpose? Why is the AI system the right tool? Is the purpose lawful, articulated, and real and present? Cite the business case and the named accountable executive.
- Section 2 - Necessity test. Why is processing personal data necessary? Have less invasive alternatives (synthetic data, licensed datasets, federated learning) been considered? Cite the data minimisation assessment and the volume of personal data used.
- Section 3 - Balancing test. Identify the rights and freedoms of data subjects, including the right to object under Article 21 GDPR. List sensitive categories (Article 9), vulnerable populations, public-vs-private data, web-scraped data, and the irreversibility of training. State the mitigations - differential privacy, output filtering, retention limits, opt-out mechanisms.
- Section 4 - Conclusion and review cadence. State the conclusion of the LIA, the next review date, and the triggers for re-assessment (material change to the model, new training data, new deployment context).
The Areebi platform's policy engine refuses to allow AI processing where the LIA is missing, expired, or unsigned by the named executive.
DPIA template for generative AI
Under GDPR Article 35, a DPIA is required where processing is likely to result in a high risk to rights and freedoms. The EDPB and most national supervisory authorities treat large-scale processing of personal data for AI training, and most consequential AI-driven decisions, as high-risk processing that requires a DPIA. The template below extends the standard ICO and CNIL DPIA structures with AI-specific sections.
- Description of the processing. Include the model architecture, training-data composition, deployment context, and the data flows. Reference the Areebi architecture walkthrough for the standard reference diagram.
- Necessity and proportionality assessment. Map to the Article 6 basis, the Article 9 condition if special category data is in scope, and the data minimisation analysis.
- Risk assessment. Identify the AI-specific risks: memorisation, prompt injection that leaks personal data, model inversion, hallucinated personal data, bias against protected classes, automated decision-making impacts under Article 22.
- Mitigations. Document the controls. Areebi customers typically reference the policy engine, audit log, prompt filtering, output redaction, and the DPO sign-off workflow.
- Consultation. Document the consultation of the DPO and, where required under Article 36, the prior consultation of the supervisory authority.
- Review. State the review cadence. CNIL's AI How-To Sheets recommend annual review at minimum, plus event-driven reviews.
Areebi's compliance team designs the DPIA workflow so each AI use case carries a versioned DPIA artefact that can be exported to a supervisory authority on request.
See Areebi in action
Get a 30-minute personalised demo tailored to your industry, team size, and compliance requirements.
Get a DemoArticle 9 special category data and AI training
If any training data includes special category data under Article 9 GDPR (racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic and biometric data, health, sex life, or sexual orientation), an additional Article 9(2) condition is required. The Opinion does not formally address this, but the EDPB's Guidelines 3/2019 on the processing of personal data through video devices and CNIL's How-To Sheets have made clear that web-scraped training data routinely captures Article 9 data, and that controllers cannot ignore it just because they did not intend to collect it.
The practical implications:
- Web-scraping a public dataset does not cleanse Article 9 data. The Opinion's anonymity test must be evaluated on the trained model with explicit attention to Article 9 categories.
- The "manifestly made public by the data subject" condition (Article 9(2)(e)) is narrow. Posting a photo to a social-media profile is not the same as manifestly making one's biometric template public.
- Explicit consent (Article 9(2)(a)) is rarely feasible at training scale. Substantial public interest (Article 9(2)(g)) is the most common alternative for regulated controllers, but it requires a basis in Member State law.
Areebi's policy engine ships a pre-built Article 9 filter that detects suspected special category content in prompts and completions and triggers the configured handling (block, redact, escalate).
Article 22 automated decision-making and AI systems
Most generative AI systems are configured for human review, but the moment a decision becomes consequential and automated, Article 22 GDPR applies. A decision is "based solely on automated processing" if there is no meaningful human review - and the CJEU's December 2023 SCHUFA judgment (Case C-634/21) made clear that "meaningful" sets a higher bar than rubber-stamping. AI-generated outputs that drive credit, employment, insurance, or access decisions are squarely in scope.
The required safeguards under Article 22(3) and Recital 71 include: the right to obtain human intervention, the right to express a point of view, the right to contest the decision, and the right to an explanation. Areebi's audit log captures every human-in-the-loop intervention, the reasoning chain (where the model exposes it), and the decision outcome - so the controller can demonstrate Article 22 compliance to a supervisory authority on request.
International transfers, US AI vendors, and Schrems II
If your AI vendor processes personal data outside the EEA, GDPR Chapter V applies. The CJEU's Schrems II judgment (Case C-311/18, July 2020) invalidated the EU-US Privacy Shield and tightened the Standard Contractual Clauses (SCCs) regime. The EU-US Data Privacy Framework (adopted July 2023) restored an adequacy decision for certified US recipients, but the EDPB has continued to scrutinise transfers tightly.
For LLM systems, the practical implications:
- Verify the vendor's DPF certification at dataprivacyframework.gov before relying on adequacy.
- For non-DPF transfers, use the EDPB-approved SCCs (2021 controller-to-processor or processor-to-processor modules) with a Transfer Impact Assessment (TIA) under EDPB Recommendations 01/2020.
- Document supplementary measures where the TIA finds the destination jurisdiction inadequate. For US transfers, this typically includes encryption with EEA-held keys, contractual restrictions on US government access, and audit rights.
Areebi's platform supports residency configuration so EU customers can pin training, inference, and audit data to EEA regions, with EEA-controlled key management for transit and at-rest encryption.
Data subject rights against AI models
The right of access (Article 15), right to erasure (Article 17), and right to object (Article 21) are particularly difficult to satisfy in trained models. The Opinion does not solve this problem but it accepts the difficulty. The practical pattern that has emerged across EDPB national authorities:
- Right of access: the controller must provide a meaningful answer about whether the data subject's data was used in training, the categories of recipients (downstream deployers), and the safeguards. Areebi customers maintain a training-data manifest that supports access requests.
- Right to erasure: the EDPB accepts that erasure from a trained model is technically difficult; the practical answer is a combination of (a) erasing the source data from the training pipeline so it does not appear in future retraining, (b) demonstrating that the trained model does not reproduce the data subject's data via memorisation testing, and (c) where neither is feasible, retraining.
- Right to object: where legitimate interest is the basis, the data subject can object at any time. Controllers must have a workflow that handles objections and either (a) ceases processing or (b) demonstrates compelling legitimate grounds that override the data subject's interests.
Areebi's compliance team designs this control so each data subject request is logged, routed, and answered within the statutory one-month window under Article 12(3).
The 90-day operational playbook
The translation from Opinion 28/2024 to operational control is what most CISOs struggle with. The 90-day plan below is what Areebi's compliance team uses with regulated customers entering the EDPB framework for the first time.
| Phase | Days | Activities | Areebi platform support |
|---|---|---|---|
| Discover | 0-30 | Inventory all AI use cases. Identify training data sources, deployment contexts, and downstream recipients. Map to Article 6 bases. | Shadow AI discovery (see 90-minute shadow AI hunt), policy engine, audit log. |
| Assess | 30-60 | For each use case: build the LIA, the DPIA, the Article 9 assessment, the Article 22 safeguards, and the transfer impact assessment. | Pre-built LIA, DPIA, and TIA templates with version control. |
| Operationalise | 60-90 | Wire mitigations into the platform: prompt filtering, output redaction, data subject rights workflow, retention limits. | Policy engine enforces mitigations at runtime; audit log records every decision. |
The Areebi GDPR compliance hub walks through each step with concrete control mappings.
Areebi's point of view
Most GDPR-for-AI guidance stops at the conceptual level. Opinion 28/2024 demands operational artefacts: a documented LIA per use case, a DPIA per high-risk system, an Article 22 human-review workflow, a data subject rights pipeline, a TIA per international transfer, and an anonymity assessment per trained model. Areebi's policy engine enforces this control set by default: any AI use case that is missing one of these artefacts is blocked from production until the gap is closed.
This is not because we believe the EDPB will audit every controller next month. It is because the gap between "we have a policy" and "we have a control" is where regulatory findings actually live. The Areebi platform closes that gap with auditable artefacts the supervisory authority can read.
Frequently Asked Questions
Is EDPB Opinion 28/2024 legally binding?
Opinion 28/2024 was adopted under Article 64(2) GDPR, which makes it formally an opinion rather than a binding decision on national supervisory authorities. In practice, Article 64(2) opinions are treated as the operative interpretation of the GDPR in enforcement, audit, and litigation, and CNIL, ICO, BfDI, and the Italian Garante have aligned their AI guidance with the Opinion in the first quarter of 2026.
Can legitimate interest support training an LLM on web-scraped personal data?
Possibly, but the bar is high. The Opinion confirms legitimate interest can in principle support training, but the three-stage test bites hard for web-scraped data. The controller must show the purpose is real and present, the data is necessary (synthetic or licensed alternatives were considered), and the balancing test does not tilt against the data subject. Web-scraping public data often fails the balancing test where vulnerable populations or sensitive categories are involved.
What is the cross-phase contamination doctrine?
Opinion 28/2024 holds that if a model is trained unlawfully, the unlawfulness can contaminate the deployment phase, especially where the deploying controller knew or should have known. The mitigations include retraining with lawful data, applying differential privacy or output filtering to the deployed model, and demonstrating that the model no longer reproduces personal data. Deploying controllers should require AI vendors to warrant the training process contractually.
Does deployment automatically inherit the training Article 6 basis?
No. Opinion 28/2024 treats training and deployment as separate processing operations with distinct lawful bases. Each phase must be independently assessed and documented. A common error is to claim legitimate interest for training and assume it carries through deployment; controllers must instead document the deployment basis on its own merits.
How does Article 22 apply to generative AI outputs?
Article 22 applies where an AI-generated output drives a consequential decision with no meaningful human review. The CJEU's SCHUFA judgment (Case C-634/21, December 2023) set the bar: the human review must be meaningful, not rubber-stamping. Where Article 22 applies, the controller must provide the right to human intervention, the right to express a point of view, the right to contest, and the right to an explanation.
Where does the right to erasure leave models that have memorised training data?
The EDPB accepts that erasure from a trained model is technically difficult. The practical pattern is (a) erase the source data from the training pipeline so it does not appear in future retraining, (b) demonstrate via memorisation testing that the deployed model does not reproduce the data subject's data, or (c) where neither is feasible, retrain. Areebi's platform supports each option with auditable artefacts.
Related Resources
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and governance best practices.
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and best practices.
About the Author
Areebi Research
The Areebi research team combines hands-on enterprise security work with deep AI governance research. Our analysis is informed by primary sources (NIST, ISO, OECD, federal registers, IAPP) and the operational realities of CISOs running AI programs in regulated industries today.
Ready to govern your AI?
See how Areebi can help your organization adopt AI securely and compliantly.