On this page
TL;DR
Choosing an on-premise or private AI chatbot is a governance procurement, not a chatbot procurement. The chat interface is commoditised; the difference between vendors - and the entire liability - lives in security, deployment flexibility, model support, and governance. Score every option against four criteria: can it run inside your boundary (on-premise, VPC, or air-gapped), does it inspect data in real time (DLP with PII and PHI redaction), is it governed (SSO, RBAC, immutable audit, policy engine), and is it model-agnostic (many providers, not one). Ask vendors the hard questions about data path, audit, and incident history, and walk away on the red flags - vague residency answers, "logging" that is not tamper-evident, no inline DLP, and a single-model lock-in. Build a TCO model that captures the operational and governance costs vendors omit, not just the licence. Areebi is built to satisfy this checklist on your own infrastructure; use the framework below to evaluate it and everything else on the same terms. Updated 2026-06-10.
What 'on-premise AI chatbot' actually means - and what it does not
Before evaluating vendors, pin down the term, because "on-premise," "private," and "secure" are used loosely and the gaps between them are where bad procurements happen. An on-premise AI chatbot, strictly, runs inside infrastructure your organisation controls - your data centre, your private cloud tenancy, or an air-gapped environment - so that prompts, responses, uploaded documents, and logs never transit a third-party provider.
Three distinctions matter for a buyer:
- On-premise is about the data path, not the user experience. The chatbot can look identical to a public tool; what makes it on-premise is where inference happens and where data rests. See what is a private LLM for the four deployment models (on-premise, VPC, air-gapped, local-only).
- On-premise is necessary but not sufficient for "secure." Running inside your boundary closes the external data path, but it does not deliver access control, data loss prevention, or audit. A poorly governed on-premise chatbot can leak data internally to the wrong user just as easily. Privacy of hosting and governance of usage are different problems - see what is LLM security.
- "Private" is sometimes marketing for a hosted SaaS with a no-training clause. Some vendors describe a multi-tenant cloud product as "private AI" because they contractually promise not to train on your data. That is not the same as on-premise - your data still leaves your boundary and rests on the vendor's infrastructure. For organisations with residency obligations, the distinction is decisive.
The buyer's job is to translate the requirement into capability terms and hold every vendor to the same definition. The rest of this guide gives you the framework to do exactly that. For the foundational definitions, read what is an enterprise LLM.
The requirements checklist
Start by deciding which requirements are mandatory for your organisation and which are optional, because the same capability is non-negotiable for one buyer and irrelevant for another. Work through this checklist with your security, compliance, and IT stakeholders before you talk to a single vendor - it prevents a slick demo from setting your criteria for you.
Data and deployment.
- Can it deploy on-premise, in our VPC, and air-gapped if required?
- Can we choose the country and data centre where data is processed and stored - data residency?
- Does any data - prompts, documents, telemetry, licensing checks - leave our boundary at any point?
Security and data protection.
- Real-time DLP inspecting prompts and responses for PII, PHI, secrets, and source code, with block, redact, and warn actions?
- Defences against prompt injection, including inspection of retrieved RAG content?
- Can we block employees from reaching ungoverned external AI tools (browser-extension control)?
Governance and identity.
- SSO, SAML, and MFA integration with our identity provider; no shared accounts?
- Role-based access control mapped to our org structure?
- Immutable, tamper-evident, per-user audit logs that we can export for an auditor?
- A policy engine - ideally no-code - to define which models, data classes, and teams are permitted?
Model and knowledge.
- Is it model-agnostic across many providers (open-weight and commercial), or locked to one?
- Does it support RAG over our documents with retrieval-time access control and workspace isolation?
Compliance.
Mark each item mandatory or optional, and require every vendor to respond to the same list. A capability missing from a demo is not the same as a capability that does not exist - and a capability a salesperson asserts is not the same as one you have verified. The next sections turn this list into scored criteria and verification questions.
Evaluation criteria table
Score each shortlisted vendor on the four criteria below, weighting them by your own mandatory requirements. The "what good looks like" column is the bar a serious enterprise option clears; the "warning sign" column is where an evaluation should slow down and dig.
| Criterion | What to assess | What good looks like | Warning sign |
|---|---|---|---|
| Security | DLP, injection defence, encryption, hardening | Inline DLP on prompts and responses; retrieved-content inspection; encryption at rest and in transit | No inline DLP; "the model is safe" hand-waving |
| Deployment | Topologies supported; data path; residency | On-premise, VPC, and air-gapped; data never leaves the boundary; you choose the jurisdiction | Hosted SaaS only, marketed as "private"; vague on data path |
| Model support | Provider breadth; open-weight and commercial; routing | Many providers; route by task and sensitivity; no re-platforming to switch | Single model or single provider lock-in |
| Governance | SSO, RBAC, audit, policy engine | SSO, SAML, MFA, RBAC; immutable per-user audit; no-code policy engine | Shared logins; editable "logs"; policy only via code or support tickets |
A practical scoring method: rate each criterion 0-3 against the "what good looks like" bar, multiply by a weight reflecting how mandatory it is for you, and sum. The exercise forces a like-for-like comparison and surfaces the vendor that demos beautifully but scores poorly on the criteria that actually carry your risk. The four criteria map directly to the five enterprise controls in what is an enterprise LLM - if a vendor scores well here, it will satisfy that checklist too.
The questions to ask every vendor
The right questions are the ones that are uncomfortable to answer evasively. Each of these forces a specific, verifiable answer rather than a marketing claim, and the quality of the answer tells you as much as its content.
On data path and residency:
- "Walk me through exactly where a prompt goes from the moment a user submits it. Does it ever leave our boundary - for inference, telemetry, licensing, or model updates?"
- "In an air-gapped deployment, what breaks? What features depend on outbound connectivity?"
- "Can we choose the specific country and data centre, and can you put that in the contract?"
On data protection:
- "Show me DLP redacting a credit card number and a medical record number in a live prompt, in both the prompt and the model's response."
- "How do you defend against indirect prompt injection through a poisoned document in a RAG knowledge base?"
On governance and audit:
- "Generate an audit report showing every prompt a specific named user sent last month, and prove the log cannot be edited after the fact."
- "Can a non-engineer in our compliance team author and change a policy, or does every rule change require your support team or our developers?"
On the model layer:
- "If a better open-weight model ships next quarter, what does switching to it cost us - configuration, or re-platforming?"
- "Can we route sensitive workloads to a local model and lower-sensitivity tasks to a commercial API, enforced by policy?"
On track record:
- "Have you had a security incident? Walk me through one and how you handled it." (A vendor who claims none, ever, is either very new or not candid - see AI incident response.)
- "What compliance evidence do you provide out of the box for SOC 2 or HIPAA, and can I see a sample?"
Insist on seeing capabilities demonstrated against realistic data, not described on a slide. "Show me" beats "we support" every time, and the request itself filters out vendors whose capabilities exist only on the roadmap.
See Areebi in action
Get a 30-minute personalised demo tailored to your industry, team size, and compliance requirements.
Get a DemoRed flags that should end an evaluation
Some answers are not weaknesses to weigh - they are disqualifiers. If you hear these, the evaluation is effectively over for any organisation with regulated data or audit obligations.
- Vague or shifting answers on the data path. If a vendor cannot state plainly whether your data leaves your boundary, assume it does. Residency is a yes-or-no question with a contractual answer; equivocation is the answer.
- "Private" that turns out to be multi-tenant SaaS. A no-training clause on a shared cloud product is not on-premise and does not satisfy residency. If "private" means "we promise not to look," it is not private in the sense you need.
- No inline DLP. If sensitive data is not inspected before it reaches the model, the chatbot is a faster way to mishandle data, not a safer one. "Users should not paste sensitive data" is a policy, not a control - see AI DLP.
- "Logs" that are not tamper-evident or not attributable to a named user. Application logs are not an audit trail. If logs can be edited, or only show an API key rather than a person, they will not satisfy an auditor.
- Single-model or single-provider lock-in. Tying your deployment to one model ties you to that vendor's roadmap, pricing, and outages, in a market where the frontier moves monthly. Model freedom is a structural requirement, not a luxury - see what is an LLM gateway.
- Policy changes that require engineering or support tickets. If your compliance team cannot author a rule without filing a ticket, governance will lag reality permanently.
- Hand-waving on prompt injection and RAG access control. A vendor who treats RAG security as solved by "we use a good model" has not thought about retrieval-time access control or indirect injection.
- No references in your regulatory context. A vendor with no customers facing your specific obligations is asking you to be their pilot for that use case.
Any one of the first five is, on its own, sufficient reason to drop a vendor for a regulated deployment. The point of naming them is to give your evaluation team permission to end a process early rather than sinking months into a vendor that was never going to clear the bar.
The TCO worksheet: capturing the costs vendors omit
The licence or subscription fee is the visible cost. Total cost of ownership for an on-premise AI chatbot is dominated by the costs around it - infrastructure, integration, operations, and governance - and a buyer who compares only headline prices will choose wrong. Build a three-year TCO model with the following line items, and require each vendor's quote to map onto it.
Infrastructure.
- GPU and server capex, or VPC GPU instance hours over three years.
- Power, cooling, and data-centre space for on-premise.
- Storage for the vector index and audit logs, including retention requirements.
Software and licensing.
- Platform licence or subscription, per the vendor's model (per-seat, per-node, flat).
- Commercial model API costs for any workloads routed externally, metered per token.
- Any add-on modules - DLP, SSO, audit - that are priced separately (a red flag in itself; these should be core).
Implementation and integration.
- Initial deployment and configuration effort, in internal time or professional services.
- Identity integration (SSO, SAML, RBAC mapping) and any custom connectors.
- Knowledge-base ingestion, document classification, and RAG setup.
Ongoing operations (the line DIY estimates omit).
- Operations and on-call: patching, model upgrades, monitoring, backups, capacity planning - a permanent fractional-to-full headcount for a self-managed stack.
- Security hardening and incident response over time.
- Knowledge-base maintenance and re-embedding as content changes.
The decisive comparison is between a managed platform that includes governance and operations, and a do-it-yourself stack where those line items become your team's permanent burden. A DIY open-source stack often wins on the software line and loses badly on the operations and integration lines - which is the central finding of our Areebi versus DIY open source comparison and the self-hosted LLM guide. Model the full three years, not the first invoice. The crossover where the cheapest-looking option becomes the most expensive usually arrives in year one once operations are counted, as we quantify in the ChatGPT Enterprise pricing breakdown and the broader build-versus-buy analysis.
How Areebi maps to this checklist
This guide is deliberately vendor-neutral - run it against every option. For completeness, here is how Areebi maps to the four evaluation criteria, so you can score it on the same terms as everything else.
- Security: real-time DLP with PII and PHI detection and redaction on every prompt and response, inspection of retrieved RAG content for prompt injection, and a browser extension that blocks ungoverned external AI tools.
- Deployment: on-premise via Docker or Kubernetes, VM images, fully air-gapped, or local-only inference via Ollama or LM Studio - data never leaves your boundary, and you choose the jurisdiction. See data residency for AI.
- Model support: model-agnostic across 30+ LLM providers, open-weight and commercial, with routing by task and sensitivity, so the model is a swappable component rather than a lock-in.
- Governance: SSO, SAML, MFA, and RBAC; immutable, per-user audit logs; a no-code policy engine your compliance team operates directly; and workspace-isolated RAG with retrieval-time access control.
On the TCO question specifically, Areebi ships the governance layer - DLP, identity, audit, policy - integrated rather than as separate modules or as engineering you must build, which is where the operational line items in the worksheet above either land on a vendor or land on your team. Compliance evidence is aligned to SOC 2, HIPAA, and GDPR.
Take this checklist to a demo and make us prove every row against your own data and your auditors' questions - that is exactly the "show me, do not tell me" standard the questions section recommends. Pricing is on the pricing page.
Next steps
Turn this guide into action: convene your security, compliance, and IT stakeholders, agree which checklist items are mandatory, and hold every vendor to the same scored criteria and the same "show me" standard.
- What is an enterprise LLM? - the five controls and platform checklist this guide operationalises.
- What is a private LLM? - the deployment models behind "on-premise."
- Self-hosted LLM for business - the build-it-yourself path and its hidden costs.
- Areebi vs DIY open source - the cost and capability comparison for the build-versus-buy decision.
- What is LLM security? and RAG security - the controls behind the security criterion.
The single most valuable thing this framework does is stop a polished chat demo from setting your evaluation criteria. The chatbot is the easy part; the governance is the procurement. Book a demo to evaluate Areebi against the checklist, or review pricing.
External sources
- IBM, Cost of a Data Breach Report 2025: ibm.com/reports/data-breach.
- Regulation (EU) 2016/679 (GDPR): eur-lex.europa.eu/eli/reg/2016/679/oj.
- Office of the Australian Information Commissioner, Australian Privacy Principles: oaic.gov.au/privacy/australian-privacy-principles.
- OWASP Top 10 for Large Language Model Applications: owasp.org/www-project-top-10-for-large-language-model-applications.
Frequently Asked Questions
What is an on-premise AI chatbot?
An on-premise AI chatbot is a conversational AI assistant that runs inside infrastructure your organisation controls - your data centre, private cloud tenancy, or an air-gapped environment - so that prompts, responses, uploaded documents, and logs never transit a third-party provider. It is defined by where inference happens and where data rests, not by the user experience, which can be identical to a public tool. Crucially, on-premise hosting closes the external data path but does not by itself provide access control, DLP, or audit - those are governance controls layered on top.
What is the difference between a private AI chatbot and an on-premise one?
They overlap but are not always the same. On-premise specifically means running inside your own infrastructure. 'Private' is sometimes used for a multi-tenant hosted product where the vendor contractually promises not to train on your data - in that case your data still leaves your boundary and rests on the vendor's infrastructure, which does not satisfy data residency obligations. When evaluating vendors, insist on a plain answer to whether data leaves your boundary, because 'private' is a marketing term while on-premise is an architectural fact.
What are the most important criteria for choosing a private AI chatbot?
Four criteria carry the decision: security (real-time DLP with PII and PHI redaction and prompt-injection defence), deployment (can it run on-premise, in your VPC, or air-gapped, with data never leaving your boundary), model support (is it model-agnostic across many providers or locked to one), and governance (SSO, RBAC, immutable per-user audit, and a no-code policy engine). The chat interface itself is commoditised, so the difference between vendors lives entirely in these four areas. Weight them by which requirements are mandatory for your organisation.
What are the biggest red flags when evaluating an on-premise AI chatbot vendor?
Vague or shifting answers about whether your data leaves your boundary; a product marketed as 'private' that turns out to be multi-tenant SaaS with a no-training clause; no inline DLP, so sensitive data is not inspected before reaching the model; logs that are editable or not attributable to a named user; single-model or single-provider lock-in; and policy changes that require engineering or vendor support tickets. Any one of the first five is sufficient reason to drop a vendor for a deployment involving regulated data or audit obligations.
What questions should I ask an on-premise AI chatbot vendor?
Ask them to walk through exactly where a prompt goes and whether it ever leaves your boundary; what breaks in an air-gapped deployment; to demonstrate DLP redacting a credit card and a medical record number live in both prompt and response; to generate a tamper-evident audit report for a named user; whether a non-engineer can author a policy; what switching to a new model costs; and to describe a past security incident and how they handled it. Insist on 'show me' over 'we support' - demonstrations against realistic data filter out roadmap-only capabilities.
How should I calculate the total cost of ownership of an on-premise AI chatbot?
Build a three-year model with four cost groups: infrastructure (GPU capex or VPC instance hours, power, storage for the vector index and audit logs), software and licensing (platform licence, commercial model API costs, any separately priced modules), implementation and integration (deployment, identity integration, knowledge-base ingestion), and ongoing operations (patching, model upgrades, monitoring, security hardening, knowledge-base maintenance). The operations group is the line DIY estimates omit and frequently the largest over three years. Compare a managed platform that includes governance and operations against a DIY stack where those become your team's permanent burden.
Related Resources
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and governance best practices.
Stay ahead of AI governance
Weekly insights on enterprise AI security, compliance updates, and best practices.
About the Author
Areebi Research
The Areebi research team combines hands-on enterprise security work with deep AI governance research. Our analysis is informed by primary sources (NIST, ISO, OECD, federal registers, IAPP) and the operational realities of CISOs running AI programs in regulated industries today.
Ready to govern your AI?
See how Areebi can help your organization adopt AI securely and compliantly.