09:03 AM
Customer impact began in AZ03.
09:13 AM
Monitoring systems observed a spike in allocation failure rates for Virtual Machines, triggering an alert and prompting teams to investigate.
09:23 AM
Initial targeted communications sent to customers via Azure Service Health. Automated messaging gradually expanded as impact increased, and additional service alerts were triggered.
09:30 AM
We began manual mitigation efforts in AZ03 by restarting critical service components to restore functionality, rerouting workloads from affected infrastructure, and initiated multiple recovery cycles for the impacted backend service.
10:02 AM
Platform services in AZ02 began to throttle requests due to the accumulation of excessive load, following customer deployments being automatically redirected from AZ03, in addition to normal customer load in AZ02.
11:20 AM
Platform-initiated retries caused excessive load in AZ02, leading to increased failure rates.
12:00 PM
We stopped new VM deployments being allocated in AZ03.
12:40 PM
The reduction in load in AZ03 enabled AZ03 to recover. Customer success rates for operations on VMs that were already deployed into AZ03 returned to normal levels.
01:30 PM
After verifying that platform-initiated retries had sufficiently drained, we re-enabled new VM deployments to AZ03. We then stopped all new VM deployments in AZ02.
01:30 PM
We continued to apply additional mitigations to AZ02 platform services with backlogs of retry requests. These included applying more aggressive throttling, draining existing backlogs, restarting services, and applying additional load management strategies.
03:05 PM
Customer success rates for operations on VMs already deployed into AZ02 started increasing.
04:36 PM
Broad targeted messaging sent to all customers with Virtual Machines in East US 2.
04:50 PM
We started gradually re-enabling traffic in AZ02.
05:17 PM
The first update was posted on the public Azure Status page.
06:50 PM
After a period of monitoring to validate the health of services, we were confident that the control plane service was restored.
07:30 PM
Remaining downstream services reported recovery, following backlog drainage and retry stabilization. Customer impact confirmed mitigated.