Here is the uncomfortable truth nobody in the AI press wants to say plainly: capability is not the hard problem anymore. Control is.

The gap no one is talking about

Every week there is a new agent, a new model, a new computer-use demo that makes the rounds. Hermes v0.14. OpenClaw 5.18. Google Gemini desktop agents. GPT 5.6 rumours. These are real advances. But when you look at what most organisations have actually deployed, you do not see agents.

You see automation theatre: a language model that can take actions, with no visibility into what it did, no rollback if it went wrong, and no meaningful approval gate before it touched anything irreversible.

The AI News Today episodes from this week were unusually honest about this. Hermes is becoming a proper control layer. OpenClaw is tightening up reliability plumbing. The people building these tools understand that the battle is not capability. It is trustworthiness. The market has not caught up yet.

What a real control stack looks like

A production-grade agent deployment needs five things most teams skip:

  1. Audit logs. What did the agent do, when, with what context, and what came back? Without this, you have no accountability and no way to debug failures. Every tool call, every decision, every handoff should be logged.
  2. Human approval gates. Before an agent writes an email, moves money, posts something publicly, or edits a record, a human should be in the loop for anything with side effects. The gate does not need to be slow. It needs to exist.
  3. Scoped credentials. The agent should not have keys to everything. File access boundaries, read-only permissions for some data, elevated access only when needed and only for specific tasks.
  4. Rollback and kill-switches. When an agent does something wrong, and it will, can you undo it? Can you stop it immediately? If the answer is no, you have not shipped an agent. You have shipped a liability.
  5. Observability dashboards. Who did what, when, which provider ran which model, what failed, what succeeded. This is not optional. It is the difference between running agents and just hoping.

The real cost of skipping this

The Atlan and Kore.ai research from the last 30 days confirms what every practitioner already knows: context drift and fragmented architectures are the twin killers of AI ROI. Agents lose access to their earlier instructions. Teams stack point solutions that do not speak to each other. Audit trails do not exist. Compliance teams cannot determine who authorised what.

The result is what BCG found: no measurable return. Not because the AI was bad. Because the operating model was missing.

The implication for your business

If you are evaluating AI agents for your business, or you have already deployed some, the first question is not "which model?" or "which tool?" It is: what happens when this agent does something unexpected?

If you cannot answer that question clearly, you are not running AI agents. You are running an experiment with real consequences and no safety net.

The teams that will win in the next 18 months are not the ones with the best models. They are the ones who have figured out how to run agents reliably: with logs, with gates, with rollback, with oversight that actually catches problems before they become incidents.

That is the work. It is not exciting in the way a new model release is exciting. It is the work that makes everything else worth doing.