★ field note 05 / essay / classification: open / unrestricted / read time · 15 minutes / filed 2026·01·28 / sn-0047-gcx · batch 004-a /
← back to field notes [ essay ]

From AIOps clustering to autonomous execution: closing the last mile.

AIOps gave operations teams better signal: clustering, alert reduction, root-cause hints, learned topologies. What it never gave them was a hand on the wheel. The last mile of operations, the action, is where labour cost actually accumulates, and where the next decade of category creation will happen.

What AIOps actually delivered

It is worth being precise about what the AIOps category shipped and what it didn't. The wins are real. Alert clustering reduced noise by an order of magnitude. Anomaly detection surfaced incidents that hand-tuned thresholds missed. Topology inference built dependency graphs that no one had time to draw manually. Root-cause hints shortened the average triage by tens of minutes.

These are not small things. They saved a great deal of operator time at the front of the funnel. They produced cleaner queues, faster diagnosis, and better postmortems.

Where AIOps stops

AIOps stops, almost universally, at a recommendation. The signal lands in front of an operator: here is the cluster, here is the likely cause, here is what we recommend. The operator reads the recommendation, makes the judgement call, and clicks execute in some other system. The AIOps platform is a cleaner pair of eyes; it is not a hand.

This is not an accident of product design. AIOps was built when the operating assumption was that machines triage and humans act. The alternative (machines acting) required a stack that didn't yet exist: a model rich enough to reason against, a rehearsal mechanism, a consent framework, and a feedback loop that survived contact with reality. None of that was on the table at the time AIOps shipped.

AIOps refines the question. Autonomy answers it.

The last-mile problem

It turns out the action is where most of the cost lives. Operators spend a small share of their week on triage and a much larger share on execution: applying the patch, rotating the credential, restoring from the backup, throttling the rule, opening the firewall, closing the firewall, escalating the ticket, closing the ticket. The AIOps wave shrank the triage column. The execution column did not move.

This is the last-mile problem in IT operations. The signal is good. The decision is made. The action is still by hand. And because the estate keeps growing, the volume of actions keeps growing too, even when the per-action triage is faster.

What changes when the agent acts

An autonomous-execution layer changes three things at once.

Per-action latency drops to seconds. The agent reads the signal, rehearses the change, executes it, and reconciles, all in the same loop. There is no operator-tab-switch in the middle.

Per-action human cost drops toward zero on routine work. The human consent path is reserved for novel changes and high-blast-radius decisions. Routine work, the bulk of operations, runs without consent and is logged for audit.

The decision-quality stops being a function of operator fatigue. The agent makes the same judgement at 04:00 as it does at 14:00. Decisions are uniformly grounded in the current twin state, not in what the operator remembers.

The architectural shift

Autonomous execution is not a feature you bolt on top of AIOps. It is a different stack. It needs the digital twin (the model the agent reasons against), the rehearsal sandbox (the place changes are tested before they touch reality), the agent itself (the policy and decision engine), and an integration plane that can act through existing PSAs, RMMs, SIEMs, and identity systems rather than around them.

Importantly, none of this displaces AIOps. The signal layer remains valuable; it is one of the inputs into the twin. Some of the topologies and clusters AIOps produces feed directly into the agent's prior. The two coexist. The shift is that the operator's job moves up a layer, from acting on each alert to writing the policy that governs how the agent acts on alerts.

Where the category goes next

The market spend in adjacent categories (IT ops software, AIOps proper, observability) is well north of $5B today. The product layer in autonomous execution is currently a thin sliver of that. We expect it to be the dominant share of net-new spend in the category by the end of the decade, for the same reason cloud spend overtook on-prem: the labour arithmetic stops working in the old shape, and a new shape becomes available.

The CIO survey question we'll be answering in three years is no longer "what should we automate next?" It is "how much of operations is policy, and how much of operations is still execution?" Today, the second number is most of the budget. It does not need to stay that way.

filed under · essay · category · aiops · autonomy end of filing 05
[ continue reading ]
// related filings

More from the desk.

2026·04·22

The labour arithmetic of IT operations, and why automation alone never closed the loop

Essay 12 min
2026·04·14

Building the digital twin: from telemetry stream to live estate graph

Protocol Note 9 min
2026·03·19

Field report: first autonomous patch cycle across a 4,200-endpoint estate

Field Report 11 min