For six weeks beginning in late February, the agent ran an unattended patch reconciliation cycle across 4,200 endpoints, twelve sites, and three converged PSAs. This is what we measured, the two incidents it surfaced, and where the loop tightened.
The estate spans a regional managed-services portfolio: 4,212 endpoints at the start of the cycle, distributed across twelve physical sites and a fully cloud-resident core. Three previously independent PSAs had been converged into a single twin model in the preceding quarter. Identity is consolidated under a single IdP with two break-glass exceptions. Backup posture is mixed across two vendors.
The opening posture was rough. The twin reported 38,041 outstanding KBs across the endpoint population, with 11.2% of endpoints in a non-compliant patch state for at least 90 days. The longest-outstanding KB had been deferred for 412 days. There were 47 endpoints in a partially-rolled-back state from a prior automation run. Drift across configuration domains averaged 4.8% per node, manageable at the per-node level, expensive in aggregate.
┌──────────────────────────────────────────────────────────────┐
│ CYCLE 01 · OPENING POSTURE [BASELINE] │
├──────────────────────────────────────────────────────────────┤
│ ENDPOINTS 4 212 │
│ SITES 12 │
│ OUTSTANDING KBS 38 041 │
│ >90D NON-COMPLIANT 11.2% │
│ PARTIAL-ROLLBACK STATE 47 ENDPOINTS │
│ AVG CONFIG DRIFT 4.8% / NODE │
│ IDENTITIES 5 884 (1 IDP · 2 BREAK-GLASS) │
└──────────────────────────────────────────────────────────────┘
The first cycle was deliberately restrained. The agent ran inventory and posture reconciliation only (no remediation) and produced a ranked queue of changes by blast radius and severity. The full reconciliation took 38 minutes wall-clock. The queue surfaced 14,210 high-confidence remediations, 8,920 medium, and the rest tagged for operator review.
From day three, the agent began executing high-confidence remediations against the rehearsed sandbox first, then against production. By end of week one: 9,012 KBs reconciled, 1,840 configuration drifts corrected, 47 partially-rolled-back endpoints brought back to a known-good state. Mean time to remediation, weighted by severity, was 4 minutes 12 seconds. Industry baseline for the same operations under operator-driven RMM is 27 minutes.
The aggregate effect of small reductions in MTTR is not linear. It is the difference between a backlog that grows and one that shrinks.
Incident 04-01. On day eleven, a planned KB rollout to a finance subnet was halted by the rehearsal layer. The rehearsal predicted a temporary auth disruption to a payments service that shared an identity dependency with the patched endpoints. The rehearsal was correct: the dependency was real, undocumented, and would have caused a window of failed transactions. The agent paused, escalated, and the change was scheduled for an off-hours window with a tighter rollback envelope. No production impact.
Incident 04-02. On day twenty-eight, a backup-vendor connector returned silently degraded data for six hours before the connector heartbeat flagged it. The twin's confidence score for backup posture dropped during that window, the agent suspended any backup-related changes, and the operator was paged on the heartbeat alarm. Total exposure: zero changes executed against stale data. Zero production impact. The connector was patched and re-baselined within the cycle.
┌──────────────────────────────────────────────────────────────┐
│ CYCLE 02 · 6-WEEK SUMMARY [CLOSED] │
├──────────────────────────────────────────────────────────────┤
│ KBS RECONCILED 36 902 (97.0% OF OPENING) │
│ DRIFTS CORRECTED 11 740 │
│ >90D NON-COMPLIANT 0.4% (WAS 11.2%) │
│ MTTR · WEIGHTED 04M 12S (BASELINE 27M) │
│ AUTONOMY SLA 99.94% │
│ INCIDENTS 2 (ZERO PRODUCTION IMPACT) │
│ OPERATOR CONSENTS 28 (MOSTLY NOVEL CHANGES) │
└──────────────────────────────────────────────────────────────┘
The audit pass at the end of week six took 11 minutes wall-clock and reproduced the change-log from the event store. Every executed change was attributable, every rehearsal was retrievable, every consent was timestamped. The two incidents both fit cleanly inside the existing incident framework: one as a correctly-paused change, one as a correctly-suspended domain. Neither required a postmortem in the traditional sense; both became feedback into the rehearsal model.
The team's headcount did not change during the cycle. Their working hours did. By end of cycle, no one was patching anything by hand, and the shape of the on-call rotation had quietly inverted: most pages were now policy questions, not operational ones.