The previous article in this series, “Rules fail at the prompt, succeed at the boundary,” focused on the first AI-orchestrated espionage campaign and the failure of prompt-level control. This article is the prescription. The question every CEO is now getting from their board is some version of: What do we do about agent risk?

Across recent AI security guidance from standards bodies, regulators, and major providers, a simple idea keeps repeating: treat agents like powerful, semi-autonomous users, and enforce rules at the boundaries where they touch identity, tools, data, and outputs.
The following is an actionable eight-step plan one can ask teams to implement and report against:

Constrain capabilities
These steps help define identity and limit capabilities.
1. Identity and scope: Make agents real users with narrow jobs
Today, agents run under vague, over-privileged service identities. The fix is straightforward: treat each agent as a non-human principal with the same discipline applied to employees.
Every agent should run as the requesting user in the correct tenant, with permissions constrained to that user’s role and geography. Prohibit cross-tenant on-behalf-of shortcuts. Anything high-impact should require explicit human approval with a recorded rationale. That is how Google’s Secure AI Framework (SAIF) and NIST AI’s access-control guidance are meant to be applied in practice.
The CEO question: Can we show, today, a list of our agents and exactly what each is allowed to do?
2. Tooling control: Pin, approve, and bound what agents can use
The Anthropic espionage framework worked because the attackers could wire Claude into a flexible suite of tools (e.g., scanners, exploit frameworks, data parsers) through Model Context Protocol, and those tools weren’t pinned or policy-gated.
The defense is to treat toolchains like a supply chain:
- Pin versions of remote tool servers.
- Require approvals for adding new tools, scopes, or data sources.
- Forbid automatic tool-chaining unless a policy explicitly allows it.
This is exactly what OWASP flags under excessive agency and what it recommends protecting against. Under the EU AI Act, designing for such cyber-resilience and misuse resistance is part of the Article 15 obligation to ensure robustness and cybersecurity.
The CEO question: Who signs off when an agent gains a new tool or a broader scope? How does one know?
3. Permissions by design: Bind tools to tasks, not to models
A common anti-pattern is to give the model a long-lived credential and hope prompts keep it polite. SAIF and NIST argue the opposite: credentials and scopes should be bound to tools and tasks, rotated regularly, and auditable. Agents then request narrowly scoped capabilities through those tools.
In practice, that looks like: “finance-ops-agent may read, but not write, certain ledgers without CFO approval.”
The CEO question: Can we revoke a specific capability from an agent without re-architecting the whole system?
Control data and behavior
These steps gate inputs, outputs, and constrain behavior.
4. Inputs, memory, and RAG: Treat external content as hostile until proven otherwise
Most agent incidents start with sneaky data: a poisoned web page, PDF, email, or repository that smuggles adversarial instructions into the system. OWASP’s prompt-injection cheat sheet and OpenAI’s own guidance both insist on strict separation of system instructions from user content and on treating unvetted retrieval sources as untrusted.
Operationally, gate before anything enters retrieval or long-term memory: new sources are reviewed, tagged, and onboarded; persistent memory is disabled when untrusted context is present; provenance is attached to each chunk.
The CEO question: Can we enumerate every external content source our agents learn from, and who approved them?
5. Output handling and rendering: Nothing executes “just because the model said so”
In the Anthropic case, AI-generated exploit code and credential dumps flowed straight into action. Any output that can cause a side effect needs a validator between the agent and the real world. OWASP’s insecure output handling category is explicit on this point, as are browser security best practices around origin boundaries.
The CEO question: Where, in our architecture, are agent outputs assessed before they run or ship to customers?
6. Data privacy at runtime: Protect the data first, then the model
Protect the data such that there is nothing dangerous to reveal by default. NIST and SAIF both lean toward “secure-by-default” designs where sensitive values are tokenized or masked and only re-hydrated for authorized users and use cases.
In agentic systems, that means policy-controlled detokenization at the output boundary and logging every reveal. If an agent is fully compromised, the blast radius is bounded by what the policy lets it see.
This is where the AI stack intersects not just with the EU AI Act but with GDPR and sector-specific regimes. The EU AI Act expects providers and deployers to manage AI-specific risk; runtime tokenization and policy-gated reveal are strong evidence that one is actively controlling those risks in production.
The CEO question: When our agents touch regulated data, is that protection enforced by architecture or by promises?
Prove governance and resilience
For the final steps, it’s important to show controls work and keep working.
7. Continuous evaluation: Don’t ship a one-time test, ship a test harness
Anthropic’s research about sleeper agents should eliminate all fantasies about single test dreams and show how critical continuous evaluation is. This means instrumenting agents with deep observability, regularly red teaming with adversarial test suites, and backing everything with robust logging and evidence, so failures become both regression tests and enforceable policy updates.
The CEO question: Who works to break our agents every week, and how do their findings change policy?
8. Governance, inventory, and audit: Keep score in one place
AI security frameworks emphasize inventory and evidence: enterprises must know which models, prompts, tools, datasets, and vector stores they have, who owns them, and what decisions were taken about risk.
For agents, that means a living catalog and unified logs:
- Which agents exist, on which platforms
- What scopes, tools, and data each is allowed
- Every approval, detokenization, and high-impact action, with who approved it and when
The CEO question: If asked how an agent made a specific decision, could we reconstruct the chain?
And don’t forget the system-level threat model: assume the threat actor GTG-1002 is already in your enterprise. To complete enterprise preparedness, zoom out and consider the MITRE ATLAS product, which exists precisely because adversaries attack systems, not models. Anthropic provides a case study of a state-based threat actor (GTG-1002) doing exactly that with an agentic framework.
Taken together, these controls do not make agents magically safe. They do something more familiar and more reliable: they put AI, its access, and actions back inside the same security frame used for any powerful user or system.
For boards and CEOs, the question is no longer “Do we have good AI guardrails?” It’s: Can we answer the CEO questions above with evidence, not assurances?
This content was produced by Protegrity. It was not written by MIT Technology Review’s editorial staff.