On this page
TL;DR for the time-pressed
By mid-2026 the open-source vs proprietary LLM debate has moved past leaderboard rankings into the procurement and governance review. The capability gap between top open-weight models (Meta Llama 3.x and 4, Mistral Large 2, DeepSeek V3 and R1, Alibaba Qwen 2.5 and 3, Google Gemma 2 and 3) and proprietary frontier models (OpenAI GPT-4o and GPT-5, Anthropic Claude 3.5 Sonnet and Claude 4 family, Google Gemini 2.x and 3.x) has narrowed enough that the deciding factors are now legal, structural, and operational - data residency, fine-tuning rights, audit access, transparency for ISO 42001 and SOC 2, and how each licence interacts with EU AI Act Articles 50 and 53. This post is the framework Areebi's research team uses with enterprise customers to make the call, including the side of the trade-off the question gets buried under. Updated 2026-05-20.
The real procurement question in 2026
Most "open vs proprietary" debates start at the wrong level. Enterprise procurement is not really choosing between Llama and GPT in the abstract - it is choosing between a deployment topology (self-hosted weights on enterprise infrastructure) and a managed service (a model accessed via API with the provider operating the inference). The licence and weight access matter only because they constrain the topology.
Frame the choice this way:
- Self-hosted open weights. You download the weights, run inference in your own VPC or on-prem, and accept the operational burden in exchange for full control over data, retention, model lineage, and audit access.
- Self-hosted closed weights. Rare but emerging - some proprietary vendors will deploy under enterprise contracts into customer VPCs (e.g. Azure OpenAI in a private network, AWS Bedrock with model invocation logging off, Anthropic on-prem programmes for specific verticals). You get most of the residency benefit and some of the auditability benefit, but not weight access.
- Managed API with enterprise terms. OpenAI Enterprise, Anthropic Claude for Work, Google Gemini for Workspace, etc. The vendor operates the model. You sign a DPA and an AUP. Data residency depends on the region and the SKU. Fine-tuning is sometimes available with strict carve-outs.
- Managed API with consumer terms. The default consumer or developer API. Training-on-customer-data is sometimes opt-in by default. This is the tier that produces shadow AI incidents.
Open vs proprietary is a property of options 1-2 vs options 3-4. The right framing for a governance review is "what topology do we need, and which models support it under acceptable licence and contract terms?" Areebi's policy engine assumes the topology decision is made per workload and per data class, not at the company level.
The 2026 model landscape (governance lens)
A short tour of the models the governance review most commonly evaluates, with the property that matters for procurement.
Open-weight models.
- Meta Llama 3.x and Llama 4 family. Released under the Llama 3 Community License (and the Llama 4 evolution). Permits commercial use with a 700M monthly active users carve-out (above which a separate Meta licence is required), retains restrictions on use to improve other LLMs and on certain content uses, requires attribution. Weights downloadable from Hugging Face after acceptance.
- Mistral. Open-weight models (Mistral 7B, Mixtral, Mistral Nemo) under Apache 2.0 and a research licence, alongside hosted enterprise tiers (Mistral Large, Mistral Medium). Mistral publishes a model card and a technical report for each release. EU-headquartered and GDPR-aligned by default for the hosted service.
- DeepSeek V3 and R1. Released under the DeepSeek Model License permitting commercial use with restrictions on military, surveillance, and other listed uses. Weights publicly available. Provenance and training data documentation is less complete than Meta or Mistral; many enterprise buyers add this as a transparency gap on the risk register.
- Alibaba Qwen 2.5 and Qwen 3. Released under the Tongyi Qianwen Licence (Qwen 2.5 7B and below under Apache 2.0; larger variants under the Tongyi licence with use restrictions and a 100M MAU threshold). Strong multilingual capability. The EU sovereignty conversation is the dominant procurement question.
- Google Gemma 2 and Gemma 3. Released under the Gemma Terms of Use. Open weights for the small and medium variants. Tighter use restrictions than Llama or Mistral in some clauses; Google retains the right to update the Prohibited Use Policy.
Proprietary / closed-weight models.
- OpenAI GPT-4o, GPT-5. API-only access; weights not released. OpenAI Enterprise Privacy page and DPA are the contracting baseline; no training on customer API data by default for the Enterprise and Team tiers; data residency available in specific regions; Usage Policies constrain permissible uses. The Areebi OpenAI enterprise governance guide walks through the CISO checklist.
- Anthropic Claude 3.5 Sonnet, Claude 4 family. API-only access; weights not released. Anthropic Acceptable Use Policy defines prohibited uses; Anthropic Claude for Work commercial DPA and the API DPA define data handling; constitutional AI training discipline documented in Anthropic's published Constitutional AI paper; no training on customer data by default under Claude for Work. The Areebi Claude architecture walkthrough covers the integration pattern.
- Google Gemini 2.x, 3.x. API and Workspace access. Google Cloud Vertex AI provides region selection and customer-managed encryption keys. The Generative AI Additional Terms govern customer use; Workspace Gemini operates under Workspace contractual terms which are stricter than the consumer terms.
Licence and use-restriction comparison
The licence is not the property procurement is buying - the deployment topology is - but the licence sets the outer bound of what is legal. The clauses most commonly tripping enterprises in 2026:
| Model | Licence | Commercial use | Use-to-train-other-LLMs | Audit / attribution |
|---|---|---|---|---|
| Llama 3.x / 4 | Llama Community License | Permitted with 700M MAU carve-out | Restricted | Attribution required ("Built with Llama") |
| Mistral 7B / Mixtral | Apache 2.0 | Permitted | Permitted | Permissive |
| Mistral Large (hosted) | Enterprise contract | Per contract | Per contract | Per contract |
| DeepSeek V3 / R1 | DeepSeek Model License | Permitted with restrictions on listed uses | Permitted within licence terms | Use-case restrictions on military, surveillance, etc. |
| Qwen 2.5 / 3 (large) | Tongyi Qianwen Licence | Permitted with 100M MAU threshold | Restricted | Use restrictions enumerated |
| Gemma 2 / 3 | Gemma Terms of Use | Permitted subject to Prohibited Use Policy | Restricted | Google may update Prohibited Use Policy |
| OpenAI GPT family | OpenAI Service Terms + Enterprise DPA | Per service tier | Prohibited under Usage Policies | No weight access |
| Anthropic Claude family | Anthropic Commercial / Consumer terms + AUP | Per service tier | Prohibited under AUP | No weight access |
| Google Gemini | Generative AI Additional Terms / Workspace | Per service tier | Prohibited | No weight access (Gemini Pro+); Gemma is the open alternative |
The clauses most enterprises miss on first read: the MAU thresholds (Llama 700M, Qwen 100M; very few enterprises hit them, but compliance teams should still document the threshold), the use-to-train-other-models prohibitions (which interact with synthetic data pipelines and distillation programmes), and the prohibited use lists (which change unilaterally for the closed-weight providers and are a moving target).
Transparency, audit access, and ISO 42001 / SOC 2
Open weights do not automatically mean transparent. The Stanford CRFM Foundation Model Transparency Index (2023 and 2024 editions) scored ten major model developers across 100 indicators covering upstream resources, model attributes, and downstream use. The 2024 results showed median scores in the 50-60% range, with significant gaps in training-data disclosure, labour practices, and downstream impact - and open-weight models did not always outscore closed-weight ones.
For ISO/IEC 42001:2023 and SOC 2, the property that matters is the evidence the auditor can ask for:
- Model card. A documented model card (intended use, evaluation results, limitations, ethical considerations) per Mitchell et al's Model Cards for Model Reporting framework. Llama, Mistral, Gemma, DeepSeek, Qwen all publish model cards. OpenAI publishes system cards for major releases (GPT-4 system card, GPT-4o system card). Anthropic publishes detailed responsible-scaling and model card documents.
- Training data summary. ISO 42001 control 9.4 and the EU AI Act Article 53 documentation expectations both push toward training-data summaries. Mistral, Llama and Gemma publish data composition information; DeepSeek and Qwen disclose less. OpenAI, Anthropic, and Google publish high-level descriptions but not enumerated lists.
- Evaluation results. Public benchmark performance plus internal red team results. The Stanford HELM, OpenLLM Leaderboard (Hugging Face), MMLU, GPQA, SWE-Bench, and similar are baseline references. Anthropic and OpenAI publish detailed system-card evaluation suites.
- Change control. Per ISO 42001 control 9.5 and SOC 2 CC8.1, the evidence pack must include version history and change-management of the model. Open weights you self-host give you total control here. Managed APIs vary; OpenAI and Anthropic versioning policies are reasonable but model deprecations and updates still happen on the provider's cadence.
- Audit log of inference. The vendor cannot give you a unified runtime log - you build that yourself with a control plane. The Areebi AI audit primer and the AI observability primer explain the architecture.
The Areebi position: for ISO 42001 and SOC 2 evidence, the control plane is the source of audit truth regardless of which model you choose. Open weights help with the change-control and data-flow evidence; the model-card and training-data evidence is a function of the developer's disclosure discipline, not of openness.
See Areebi in action
Get a 30-minute personalised demo tailored to your industry, team size, and compliance requirements.
Get a DemoEU AI Act Articles 50 and 53: licence-level implications
The EU AI Act entered into force in August 2024, with the General Purpose AI (GPAI) provisions phasing in from August 2025. Two articles drive the open-vs-proprietary procurement decision:
- Article 50 (transparency obligations for providers and deployers of certain AI systems). Deployers of generative AI systems must disclose AI-generated content where appropriate, label deepfakes, and inform users they are interacting with an AI system. The deployer is on the hook regardless of whether the underlying model is open or closed.
- Article 53 (obligations for providers of general-purpose AI models). GPAI providers must produce technical documentation, training-data summaries, copyright policy, and (for systemic-risk GPAI) additional risk-management and incident-reporting obligations. The European Commission AI Act page tracks the implementing acts and the GPAI Code of Practice. The Code of Practice (published in mid-2025) operationalises Article 53 for signatories.
The open-source carve-out: Article 53 has a partial exemption for free and open-source GPAI models that are released under a free and open-source licence (with disclosed weights, source, and information sufficient to enable use), except where the model is a systemic-risk GPAI (defined by training compute thresholds in Annex). For most enterprise scenarios this means:
- Self-hosting an open-weight Llama or Mistral or Gemma model puts the obligation primarily on you as the deployer under Article 50 and the relevant high-risk system articles, with reduced provider-side obligations from the model developer.
- Self-hosting an open-weight DeepSeek model puts the obligation on you as the deployer, with attention to the GPAI Code of Practice for systemic-risk variants if applicable.
- Calling a proprietary API puts you as the deployer with the model provider also subject to Article 53 in full.
- Either way, fine-tuning or substantially modifying a model can make you a GPAI provider under Article 25 / 28 reattribution analysis - the European Commission GPAI guidance addresses this.
This is the clause most enterprise procurement teams underestimate: fine-tune an open-weight model substantially and the regulatory obligations shift to you. The Areebi EU AI Act compliance for mid-market guide and EU AI Act compliance hub document the deployer-provider boundary.
Data residency and sovereignty
The data-residency advantage of open weights is real but not absolute. Self-hosting in your own region puts the inference traffic in your control. Managed APIs vary by provider and region:
- OpenAI. Data residency for stored data is available in the EU, Japan, and select other regions for Enterprise customers; inference happens in the provider's infrastructure with regional pinning. OpenAI Enterprise Privacy documents the current set.
- Anthropic. AWS Bedrock provides Claude in specific AWS regions including EU regions; Google Cloud and Anthropic's own infrastructure also offer regional options. The Anthropic Trust Center documents current regions.
- Google Gemini. Vertex AI supports regional pinning across many regions; Workspace data follows Workspace data residency where the customer has opted in.
- EU sovereignty. The EuroStack and EU sovereign cloud initiatives raise the bar above region selection - some sectors (especially European public sector and finance under DORA) prefer EU-headquartered providers. Mistral and Aleph Alpha have a structural advantage here. The Areebi DORA + AI guide explains the residency questions for financial services.
- China-headquartered open weights. DeepSeek and Qwen are open-weight and downloadable, but enterprise procurement frequently flags supply-chain provenance and adversarial-data risks separately. The weights are open; the build chain and training data are not auditable to the same degree as Mistral or Llama.
The pattern Areebi sees: enterprises increasingly run a portfolio - a proprietary frontier model for the highest-capability workloads where capability outweighs residency cost, and an open-weight model for the data-residency-sensitive workloads. The control plane is the same in both cases.
Fine-tuning rights and retraining liability
The right to fine-tune is the single sharpest difference between open and proprietary models in practice.
- Open-weight models. You fine-tune freely on your own infrastructure, with full data control and audit access. The trade-off is that your fine-tuned model is now a derived work under the base licence (Llama community licence terms flow through; Mistral Apache 2.0 is fully permissive; DeepSeek and Qwen flow through their respective licences). Under EU AI Act Article 25 / 28, if your fine-tune is substantial you may become a GPAI provider with the corresponding obligations.
- Proprietary models. OpenAI and Anthropic both offer fine-tuning programmes with strict data-use carve-outs. The fine-tuned model lives in the provider's infrastructure; you do not get the weights. Anthropic has historically been the more conservative on fine-tuning availability; OpenAI's fine-tuning is more mature and broadly available for GPT-4o-class models. Google Vertex AI supports tuning of Gemini and PaLM-family models with similar managed-service patterns.
The compliance impacts to wire through:
- GDPR right to erasure (Article 17). If a data subject's personal data was in your fine-tuning corpus and they request erasure, removing it from the corpus is necessary but not sufficient - you have to consider whether the model has memorised the data. The EDPB Opinion 28/2024 on AI training data is the current European reference. Open-weight self-hosted fine-tunes give you the ability to retrain or unlearn; proprietary fine-tunes depend on the provider's process.
- EU AI Act provider obligations. Substantial fine-tuning of an open-weight model can re-attribute provider obligations to you. Light fine-tuning typically does not, but the line is unclear.
- SOC 2 change management. Every fine-tune is a change. The audit log must show the data lineage, the change approval, and the post-deployment validation.
The Areebi fine-tuning vs RAG compliance trade-offs guide goes deeper on this dimension.
The decision matrix
The framework Areebi's research team uses with enterprise customers. The right answer is almost never "all open" or "all proprietary" - it is a portfolio mapped to data classes, latency requirements, and capability ceilings.
| Workload | Typical winner | Why |
|---|---|---|
| High-capability reasoning, low-residency-sensitivity (sales, marketing, research) | Proprietary frontier (GPT-5, Claude 4 Opus, Gemini 3 Ultra) | Capability and managed-service economics |
| Regulated PHI / PII, high-residency-sensitivity (clinical, financial customer data, EU public sector) | Self-hosted open-weight (Llama 4, Mistral Large self-hosted) | Data residency and audit access |
| Latency-critical inference, high volume (in-product completions, search ranking, summarisation) | Self-hosted open-weight or regional proprietary (Llama, Mistral small, GPT-4o-mini, Claude 3.5 Haiku) | Cost-per-inference and latency control |
| Sovereign / national security / defence-adjacent | Self-hosted open-weight from trusted supply chain (Mistral, Llama) | Provenance and supply-chain audit |
| Multilingual or non-English-dominant locales (EU minor languages, MENA, APAC) | Often a portfolio (Mistral or Qwen for non-English coverage, paired with a frontier for English) | Capability per language |
| Agent / tool-use workloads requiring function calling, vision, browsing | Proprietary frontier today; open-weight catching up via Llama 4 and Qwen 3 | Multimodal and tool-use capability gap is narrowing but not closed |
The Areebi AI vendor risk score and AI framework comparison tools let teams score and document this trade-off.
Why the control plane matters more than the model
The lesson Areebi customers consistently report is that the model is the wrong unit of decision. Whichever model you pick, the operational reality is:
- The audit log of prompts, completions, and policy decisions has to live in your infrastructure regardless.
- The vendor registry, BAA / DPA tracking, and Article 53 evidence pack has to live in your infrastructure regardless.
- The shadow AI discovery surface (see the 90-minute shadow AI hunt) covers every model.
- The policy engine enforcing data classification, retention, output filtering, and access control sits in front of every model regardless of openness.
- The incident response runbook (the AI incident response runbook goes deep) covers every model.
That is the AI control plane argument. The right governance investment is in the control plane that makes the model choice swappable, because in 2026 it will be swappable.
Areebi's point of view
Open-weight vs proprietary is a topology decision, not a values decision. Areebi's research team's view: the enterprises that win in 2026 are the ones who treat models as commoditised inputs, build the control plane that makes them swappable, and stop framing the choice as ideological. The hard work is in the policy engine, the audit log, the vendor registry, and the incident response runbook - not in the Hugging Face download. Both Llama and Claude are good answers; the question is whether your control plane lets you use either.
Frequently Asked Questions
Is an open-weight LLM automatically more compliant than a proprietary one?
No. Compliance is a property of the deployment topology, the contracts, and the control plane, not of the weight access. Open weights give you better data-residency and change-control evidence; proprietary models often have more polished system cards and incident-disclosure programmes. The right answer is a portfolio with a unified control plane.
Does fine-tuning an open-weight model make me a GPAI provider under the EU AI Act?
It can. Article 25 / 28 of the EU AI Act addresses substantial modification, and a substantial fine-tune of an open-weight model can re-attribute provider obligations to you. The European Commission GPAI Code of Practice and implementing guidance set out the test. Light task-specific fine-tuning typically does not trigger this; heavy retraining or capability expansion can.
Can I use DeepSeek or Qwen in regulated workloads?
The licences permit commercial use within the listed restrictions. The harder questions are supply-chain provenance, training-data transparency, and sectoral regulator stance (especially in EU public sector, US federal, and some financial services). Most Areebi customers in highly regulated sectors deploy DeepSeek and Qwen only in self-hosted topologies, with the same control-plane evidence as any other open-weight model.
Does OpenAI or Anthropic train on my API data?
Under the Enterprise and Team tiers (OpenAI) and Claude for Work and the API DPA (Anthropic), no - training on customer API content is not the default. Read the current DPA carefully for opt-in flags, retention windows, and abuse-monitoring carve-outs. Consumer-tier products are different. The Areebi vendor registry tracks the current contract status per vendor and per SKU.
What is the Llama 700M MAU threshold?
The Llama 3 Community License permits commercial use except by entities that had more than 700 million monthly active users in the calendar month preceding the licence date, in which case a separate Meta licence is required. Very few enterprises hit this, but the compliance team should still record the threshold on the vendor registry.
Is the model-transparency gap closing?
Slowly. The 2024 Stanford Foundation Model Transparency Index showed median scores rising from 2023 but still well below where ISO 42001 auditors would prefer. Open-weight providers (Mistral, Meta, Google for Gemma) generally publish more on data composition than closed-weight providers, but the gap on labour, downstream impact, and incident disclosure remains significant.
Related Resources
- AI Control Plane (definition)
- AI Audit (definition)
- AI Observability (definition)
- EU AI Act Compliance Hub
- GDPR Compliance Hub
- Areebi Platform
- Trust Center
- AI Vendor Risk Score (tool)
- AI Framework Comparison (tool)
- OpenAI Enterprise Governance Guide
- Anthropic Claude Architecture Walkthrough
- EU AI Act for Mid-Market
- DORA + AI for Financial Institutions
- Fine-tuning vs RAG compliance trade-offs
- AI incident response runbook
- 90-minute Shadow AI Hunt
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and governance best practices.
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and best practices.
About the Author
Areebi Research
The Areebi research team combines hands-on enterprise security work with deep AI governance research. Our analysis is informed by primary sources (NIST, ISO, OECD, federal registers, IAPP) and the operational realities of CISOs running AI programs in regulated industries today.
Ready to govern your AI?
See how Areebi can help your organization adopt AI securely and compliantly.