The accountability gap in production AI

Deploying an AI model on-premises does not automatically satisfy regulatory requirements around explainability. Many organizations discover this the hard way: they have kept data inside their perimeter, satisfied data residency rules, and passed an initial compliance review—only to find that their audit team cannot answer a basic question like "why did the model recommend rejecting this loan application?" or "what evidence drove this anomaly alert?"

The EU AI Act, GDPR's automated decision-making provisions, and sector-specific frameworks like Basel IV model risk guidance all push in the same direction: consequential automated decisions must be accompanied by meaningful, human-interpretable explanations. On-premises deployments add the further constraint that any explainability tooling must itself operate inside your perimeter—you cannot pipe model outputs to a cloud-based interpretation service when the inputs are sensitive.

This post examines the practical building blocks for on-premises explainability, organized by the level of intervention they require and the types of AI systems they fit.

Intrinsic vs. post-hoc explainability

The first architectural decision is whether to build explainability into the model itself or to layer it on afterward. These are not equivalent in either cost or auditability.

Intrinsically interpretable models—decision trees, linear models, rule sets, gradient-boosted trees with shallow depth constraints—produce explanations as a byproduct of inference. SHAP values for gradient-boosted trees, for example, are computed from the model's own structure with no external approximation. When regulators ask for an audit trail, these models produce it natively. The constraint is that intrinsic interpretability typically requires accepting capability limits: complex language understanding, vision tasks, and long-horizon reasoning remain out of reach for shallow models.

Post-hoc explanation methods approximate why a black-box model produced a particular output. LIME (Local Interpretable Model-Agnostic Explanations) fits a locally linear surrogate model around a specific input. SHAP in its model-agnostic form samples perturbations to estimate feature contributions. Integrated Gradients and similar gradient-based methods operate on neural networks directly. These methods give you coverage for larger models but introduce approximation error and computational overhead, both of which require careful management.

For high-stakes decisions in regulated workflows, a hybrid architecture often works well: use a large language model or complex classifier as a first-pass screener, but pass borderline or flagged cases to an interpretable model or human review queue. This keeps the powerful model in the loop while ensuring that final decisions can always be traced.

Self-hosted explanation infrastructure

Running post-hoc explanation methods at production scale requires its own infrastructure. SHAP computation for a single model call can add tens to hundreds of milliseconds depending on model size and the number of background samples. Integrated Gradients requires multiple forward passes. If you apply explanations to every inference call, you will roughly double or triple your compute budget; most organizations apply them selectively.

Practical patterns for on-premises explanation serving include:

Asynchronous explanation queues: Log the raw model input and output at inference time, then process explanations asynchronously in a background job. Store results in an internal audit database keyed by decision ID. This decouples explanation latency from user-facing response time. The downside is that explanations are not immediately available, which is acceptable for audit purposes but may not satisfy applications that need real-time justifications.

Selective synchronous explanation: Apply real-time explanations only to high-confidence rejections, high-impact outputs, or cases that exceed a risk threshold. Low-risk routine outputs receive a logged explanation batch-processed overnight. This prioritizes compute where explainability matters most.

Cached explanation templates: For models that operate over bounded input spaces—document classification, structured data scoring—pre-compute explanations for representative input clusters and map new inputs to the nearest cluster explanation. This is an approximation, and its adequacy depends on how coarse your input space is.

The open-source SHAP library and Captum (for PyTorch models) run entirely on local infrastructure. Both are mature, well-documented, and integrate with standard Python serving stacks. For organizations using ONNX Runtime for model serving, SHAP can wrap the ONNX model with minimal adaptation.

Audit trail design

Explainability output is only useful if it is preserved in a way that survives model upgrades, personnel changes, and regulatory inquiries years after a decision was made. Explanation storage requires explicit design, not an afterthought.

Each explanation record should contain: a unique decision identifier traceable back to the source transaction; the model name and version that produced the output; a snapshot of the input features or a cryptographic hash of the input document; the explanation values themselves (SHAP values, feature importances, or natural language summaries); a timestamp; and the identity of the human reviewer if a human was in the loop.

Store explanation records in a write-once, append-only log. Object storage with object lock policies (your on-premises MinIO or equivalent) prevents retroactive modification. Retain records according to your regulatory framework's minimum retention period—typically five to seven years in financial services, longer in healthcare.

Define an internal API for explanation retrieval by decision ID. Audit teams should be able to pull a complete decision record without requiring access to production model infrastructure. This separation protects production systems from audit traffic spikes and allows you to archive older records without disrupting active model serving.

Explainability for large language models

LLMs present a harder problem than classical ML models. Attention weights, once widely cited as explanations, are now understood to correlate poorly with actual feature importance. The most robust on-premises approaches for LLM explainability combine several techniques.

Attribution to source documents: For RAG-based systems, the retrieval layer provides a natural explanation anchor. Log which documents were retrieved, their similarity scores, and how much of the answer was grounded in each source. This does not explain the model's internal reasoning, but it does answer the practically useful question of "where did this answer come from?"

Chain-of-thought elicitation: Prompting models to reason step-by-step before producing a final answer generates an explicit intermediate representation that can be stored as part of the audit record. The reasoning trace is not a formal proof, but it allows human reviewers to identify when the model followed an incorrect chain of logic.

Contrastive explanations: Ask the model to explain why it did not recommend alternative X, or what evidence would have changed its recommendation. Contrastive formats often produce more actionable explanations for human reviewers than positive attributions alone.

For models used in high-stakes decisions, consider implementing a model card that documents known failure modes, the populations or input domains where accuracy degrades, and the conditions under which human override is mandatory. A model card is not an automatic explanation, but it provides regulators with evidence that the organization understands its model's limitations.

Practical steps to get started

Organizations beginning this work should resist the temptation to implement a comprehensive explainability platform before understanding their actual regulatory obligations. Different frameworks require different levels of explanation granularity. Start by mapping each AI use case to its specific regulatory requirement, then select the minimum explainability mechanism that satisfies that requirement for production.

A pragmatic sequence: first, instrument existing models with input/output logging and basic SHAP explanations for the highest-risk use cases. Second, build the audit storage layer with appropriate retention and access controls. Third, conduct an internal red-team exercise where an auditor attempts to reconstruct the reasoning behind a historical decision using only what the audit log contains. The gaps that exercise reveals will drive your next iteration more effectively than any framework document.

Featured image by Zach M on Unsplash.

AI-Driven Consulting

People & Culture

Academy

Who we are

What we do

Resources

Career

Search across SysArt

Model Explainability Frameworks for On-Premises AI in Regulated Industries