May 2026

Applying an Information Security Lens to Harness Engineering

Architecting Trust in AI Agents for Production at Scale

The Pattern We Recognise

General-purpose digital infrastructure platforms follow the same arc into the world. They arrive crude, prove enough value to generate momentum, and cross into mainstream adoption before the governance infrastructure needed to manage them has been built. The commercial internet did this from the mid-90s. Cloud computing did it a decade later. Mobile enterprise followed. In each case the governance gap was real, the costs of closing it were significant, and the organisations that moved earliest to address it earned advantages that proved durable.

Artificial intelligence has arrived at that same threshold. Everett Rogers documented the structure of technology diffusion in 1962, and the pattern holds. Numerous surveys conducted between 2025 and 2026 place AI agent deployment at the early adopter stage, with the early majority wave projected within two years. The governance window sits ahead of that wave.

Rogers technology diffusion bell curve showing AI agent deployment currently at the early adopter stage, with a marker labelled Now on the early adopter segment and a second marker at the early majority peak labelled plus two years. Sources cited: McKinsey State of AI 2025, Gartner Hype Cycle for Agentic AI April 2026, Deloitte State of AI in the Enterprise 2025.

Previous platforms were governed late because the timing and the tools were not simultaneously available. With AI agents, both conditions are met. Thirty years of enterprise security practice have already produced the architectural frameworks, control taxonomies, and governance patterns the task requires. Practitioners at this inflection point have a substantial inheritance to draw on.

The Architecture That Already Exists

Enterprise information security developed under conditions that should feel familiar to anyone now thinking about AI in production. The systems were complex, the threat actors were adaptive, and no single control was sufficient to hold the line. The organising principle that emerged from that experience, defence-in-depth, codified through the late 1990s and widely adopted across regulated industries through the 2000s, rests on a deceptively simple premise: no layer of protection should assume the layer before it has held. Every control is designed with the expectation that it may be the last one standing with resilience through redundancy as the payoff.

Defence-in-depth gained its practical expression through the deployment of distinct control types at each layer, calibrated to the risk appetite of the organisation. Frameworks such as SABSA, developed in the mid-90s, formalised this discipline by establishing that every security control must trace back through a chain of business requirements to a defined risk context, giving practitioners a methodology capable of answering what controls to deploy, why each one existed, and what business obligation each one served.

The operating system illustrates how this translated into engineering practice. Platform vendors progressively formalised the integration points through which security tooling could attach to both kernel and user space, maturing from early improvised approaches into the stable, governed interfaces that gave endpoint protection, data loss prevention, and intrusion detection the reliable foundations needed to function at enterprise scale, and the same governed integration logic extended to the full suite of preventative, detective, and corrective controls. Network architecture extended the same logic across the seven layers of the OSI model.

Each organisation calibrates the volume and combination of control types deployed at each layer to its own risk appetite and regulatory obligations. The result is a principled architecture that accommodates both the lean deployment of an early-stage organisation and the layered defence stack of a regulated financial institution. The lesson carried forward from both disciplines is that comprehensive governance emerges from deliberately combining control types across multiple layers, each calibrated to a defined risk context and each traceable to a business obligation.

Enterprise integration endeavours start with proof-of-concept deployments. However, POC deployments and controlled pilots don't always surface the entire security architecture questions that production at scale forces into view. When AI systems are ring-fenced, accessed by limited users, and operating against restricted data, general controls provide adequate coverage for the purpose. The transition to full production is where the picture shifts. AI agents embedded in live enterprise systems, customer-facing processes, and sensitive data flows bring with them an evolved threat landscape, one that is now documented in considerable depth across AI security standards and regulatory guidance, and one that calls for architecture engineered to address it.

Harness Engineering: The Operationalisation Layer

Harness engineering is the discipline that steps into this space. Articulated by Mitchell Hashimoto in February this year, it defines the full environment within which an AI agent operates. The formula is precise and important; an agent equals a model plus a harness. The model is a stateless probabilistic reasoning engine. The harness is the Runtime Software Infrastructure surrounding it, governing every interaction the model has with the real world. That is the tools it can call, the data it can access, the outputs it can produce, and the boundaries within which all of those interactions occur.

Where attention and investment flow determines what gets engineered, a model optimised without a governed harness is a capable reasoner operating without constraints. The harness transforms probabilistic capability into dependable operational behaviour – a necessary condition for business operations.

The threat terrain the harness must address is well documented. MITRE ATLAS catalogs adversarial techniques across the full AI lifecycle, covering attack surfaces in data pipelines, model architectures, inference APIs, and training processes. The OWASP Top 10 for Large Language Model Applications identifies the principal vulnerability classes in production AI systems; Excessive Agency (LLM06), the condition in which an AI system is granted more permission, capability, or autonomy than the task actually requires, is the vulnerability class that the harness directly addresses. The flaw is an assembly defect. How the agent was put together determines the vulnerability in this class of risk. A model's reasoning capability is a distinct and separate consideration.

Regulatory frameworks have arrived alongside the threat taxonomy. The EU AI Act brought General-Purpose AI obligations into force from August 2025 and high-risk AI system requirements due from August 2026, with financial exposure reaching 35 million euros or 7% of global annual turnover. The UK Government's Code of Practice for the Cyber Security of AI, published in January 2025, establishes four lifecycle pillars grounded in secure-by-design principles and explicitly recognises adversarial machine learning as a threat class requiring dedicated controls. NIST's AI Risk Management Framework provides the operating model through its Govern, Map, Measure, and Manage cycle. The harness is the technical architecture through which each of these regulatory obligations finds its fulfilment in practice.

Trust Harness Engineering: Five Layer Architecture

The trust harness translates the information security lens directly into AI-specific architecture, applying the same control taxonomy established across decades of enterprise security practice. Five layers, each addressing a distinct threat class through a calibrated combination of preventative, detective, and corrective controls, each mapping to established security analogues, and each carrying a regulatory reference that makes the implementation obligation explicit.

Five-layer trust harness architecture diagram rendered as stacked isometric blocks with graduated blue colouring. From bottom to top: Layer 1 Identity and Privilege Boundary, Layer 2 Input Inspection, Layer 3 Context and State Governance, Layer 4 Output Verification, Layer 5 Observability and Response.

Identity and Privilege Boundary. Least-privilege access control per agent role and task, tool permission scoping to the minimum required, and workload identity authentication with agent credential binding. No agent inherits access from context; every permission is earned and bounded. The control mix is preventative at its foundation, with detective reach through agent identity assertion and privilege event logging capturing tool access attempts, permission evaluations, and escalation events, and corrective action through privilege de-escalation and agent session termination on confirmed identity anomaly. Established Parallel: OS privilege ring controls and Zero Trust Architecture. Reference: OWASP LLM06, EU AI Act Article 15, NIST SP 800-207.

Input Inspection. Prompt content filtering against injection signatures and prohibited instruction patterns, PII redaction at ingress, input schema validation against permitted instruction formats, and adversarial intent classification before any prompt reaches the model. The governing principle is the same one that underpins network perimeter controls: nothing reaches the model context without passing through a governed inspection point. Preventative filtering and detective classification operate in combination, with corrective reach through agent session termination and full interaction audit capture on confirmed injection detection. Established Parallel: Network DLP and WAF ingress inspection. Reference: OWASP LLM01 and LLM02, NCSC secure design guidance.

Context and State Governance. Session memory isolation, context window boundary controls, schema validation on all retrieved and injected content, and scoped retrieval access restricted to authorised content per agent task. The preventative and detective controls in this layer address the same vulnerability class that kernel memory protection addresses in operating system architecture; a model that accumulates or leaks context across sessions is operating without equivalent safeguards. Where full memory isolation cannot be enforced, fallback retrieval filtering with mandatory source attribution logging preserves the boundary. Established Parallel: kernel memory protection and process sandboxing. Reference: MITRE ATLAS AML.T0051, NIST AI RMF MAP function.

Output Verification. Output classification against harmful content taxonomies and data sensitivity tiers, egress policy enforcement, factual consistency verification with confidence threshold enforcement, and egress data loss prevention before any response leaves the harness boundary. The control mix combines detective classification of output content with preventative enforcement at egress, holding non-compliant responses at the boundary before they reach the downstream consumer, with corrective reach through response suppression, downstream notification, and root cause logging triggering prompt configuration review or model governance escalation per severity. Established Parallel: egress content inspection proxy and exfiltration DLP. Reference: OWASP LLM05 and LLM09, EU AI Act Article 15.

Observability and Response. End-to-end agent interaction trace capture and telemetry instrumentation, adaptive inference rate limiting at API gateway level, model output drift monitoring, resource consumption anomaly detection against per-agent baselines, SIEM integration, and structured incident containment covering agent suspension, evidence preservation, and stakeholder notification executed per defined response playbook. The preventative, detective, and corrective controls at this layer can serve the full harness, and where upstream preventative controls are incomplete, observability is the compensating mechanism on which programme-wide governance depends. Established Parallel: centralised security event management, behavioural telemetry collection, and structured incident response. Reference: NIST AI RMF MANAGE function, NCSC secure operation guidance, OWASP LLM10.

The five-layer structure is intentionally organised around architectural rigour. Each layer addresses a discrete class of AI risk through engineered controls. The regulatory or standards expression of those controls is a function of which framework the organisation is accountable to. Whether the governing obligation is the EU AI Act, ISO 42001, the NIST AI Risk Management Framework, the CSA AI Controls Matrix with its 243 control objectives across 18 security domains, or a combination of these, the harness layers provide the technical infrastructure through which compliance obligations can be implemented and operationalised. Organisations that build to architectural rigour first will find that demonstrating alignment to any given standard becomes a mapping exercise.

When the Stakes Are Systemic: The Extended Harness

For the great majority of organisations, the trust harness represents the right level of governance investment. It satisfies current regulatory requirements, addresses the principal documented threat classes, and provides the operational visibility that meaningful governance requires.

A specific cohort of organisations operates under different conditions. For organisations in the top decile by market capitalisation within the FTSE 100 and equivalent major indices internationally, for the 29 Global Systemically Important Banks designated by the Financial Stability Board, for Critical National Infrastructure operators under the UK's NIS Regulations and the EU's NIS2 Directive, and for their global equivalents, an AI control breakdown in production does not remain a firm-level event. It propagates. The interconnectedness that defines their position in the economic ecosystem is precisely what makes their AI risk profile categorically different.

Financial regulation has already developed the governance vocabulary for this distinction. Basel III applies broadly across the banking sector. G-SIBs carry additional capital surcharges and resolution planning requirements specifically because the Financial Stability Board recognises that their distress is not contained to the firm. The same tiering logic, applying proportionally greater governance rigour to entities whose difficulties carry systemic reach, transfers directly to AI governance.

For this cohort, the trust harness is the starting point of a larger obligation. What is required is a comprehensive, threat-informed, lifecycle-spanning framework that governs the full terrain of AI risk across the organisation's entire AI estate, extending well beyond the five operational layers. That terrain covers six distinct impact domains: Data, Model, Infrastructure, Supply Chain, Output and Behaviour, and Human and Governance. It spans every phase from Secure Design through to Secure End of Life, a phase that most current frameworks do not address with the specificity that model retirement, weight disposal, training data lineage, and embedding cache governance actually demand. And it must be differentiated by AI system type and deployment context, because the threat surface of a fine-tuned proprietary model differs materially from that of a consumed API endpoint or an autonomous agentic pipeline.

This is the domain that Cyber Native's AI Security Compass addresses (cybernative.uk/ai-security-compass). A proprietary practitioner framework covering 50 threat vectors documented across the full AI lifecycle, structured across the six impact domains, grounded in authoritative sources, and built to answer a single practical question at every point: what must the organisation have in place before this threat vector becomes a business-impacting problem.

Closing

The pattern documented in this article carries a consistent economic dimension. Organisations that addressed the governance gap early in previous platform shifts earned advantages that proved durable. Those that did not paid for it in retrofit costs, fragmented compliance spend across jurisdictions, and in the most serious cases, operational disruption at the point when the technology had embedded itself too deeply to govern cheaply.

The trust harness is where that economic logic applies to AI agents. Organisations that build it as implementations move toward production at scale can deploy AI-enabled services and internal workflows with confidence, and engage regulatory scrutiny from a position of demonstrable control. For practitioners making the investment case to senior budget holders, the argument is the same one the pattern has always made: building governance infrastructure ahead of the scaling wave is operationally and economically preferable to building in its wake.

The window is open. The early majority wave is forming. The organisations that will claim durable advantage from this period are building now.