Google Cloud CRITICAL
Requests failing for Cloud Armor customers in asia-southeast1
March 9, 2023 · 09:30 AM UTC – 10:24 AM UTC · Duration: 54min
Affected Services
Cloud ArmorGoogle Cloud Networking
Timeline
12:07 AM
Incident Report
Summary
On Thursday, 9 March 2023, Cloud Armor experienced a service outage in the asia-southeast1 region for a duration of 49 minutes. To our Cloud Armor customers whose services were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.
Root Cause
Cloud Armor allows Google Cloud (GCP) customers to configure Web Application Firewall (WAF) and other application-level filtering policies to protect their applications and services exposed to the Internet via external HTTP(S) load balancers and TCP/SSL proxies. For every incoming request, Cloud Armor evaluates the user-configured security policy before it is proxied back to the customer backend. In order to evaluate the policy, Cloud Armor makes complex and computationally intensive evaluations based on customer supplied configurations at the edge of Google’s network. Policies can be changed by customers at any time, and changes propagate within minutes.
On Thursday, 9 March 2023 at 01:35 US/Pacific, a specific configuration was applied which, when combined with incoming traffic patterns, triggered a latent bug that resulted in a failure of Cloud Armor services in the asia-southeast1 region.
Remediation and Prevention
Google engineers were alerted to the issue by internal monitoring systems on 9 March 2023 at 01:43 US/Pacific and immediately started an investigation. At 02:16 US/Pacific, engineers identified a single customer configuration as the likely cause of the issue. At 02:20 US/Pacific, engineers removed and locked the configuration at which time the service began to recover. The service fully recovered by 02:24 US/Pacific on 9 March 2023.
Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for any potential impact to your organization. We are taking immediate steps to prevent a recurrence and improve reliability in the future.
A fix for the latent bug has been deployed to production as of Friday, 17 March 2023.
More robust configuration validations and run time checks are being put in place, including manual testing and automated unit tests.
Detailed Description of Impact
On 9 March 2023 from 01:35 to 02:24 US/Pacific, approximately 50% of Cloud Armor customer security policies would not have been applied for projects in the asia-southeast1 region.
01:23 AM
Mini Incident Report
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support ..
(All Times US/Pacific)
Incident Start: 09 March, 2023 01:35
Incident End: 09 March, 2023 02:24
Duration: 49 minutes
Affected Services and Features:
Cloud Armor
Regions/Zones: asia-southeast1
Description:
Cloud Armor experienced failed requests and customer policies were unable to be applied. The preliminary root cause appears to be due to a configuration issue in conjunction with high incoming traffic. Google will be completing a full Incident Report in the following days that will provide a full root cause.
Customer Impact:
Customers experienced failed requests and Cloud Armor customer policies were not being applied.
10:34 AM
The issue with Cloud Armor has been resolved for all affected users as of Thursday, 2023-03-09 02:30 US/Pacific.
We thank you for your patience while we worked on resolving the issue.
10:25 AM
Summary: Requests failing for Cloud Armor customers in asia-southeast1
Description: We are experiencing an issue with Cloud Armor.
Our engineering team continues to investigate the issue.
We will provide an update by Thursday, 2023-03-09 03:30 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis: Customers are experiencing failed requests and Cloud Armor customer policies are not being applied.
Workaround: None at this time.