Google Cloud CRITICAL
Google Cloud Monitoring experiencing intermittent issues with metric availability for GCP resources in US regions.
April 25, 2024 · 06:30 PM UTC – 10:00 PM UTC · Duration: 3h 30min
Affected Services
Cloud MonitoringOperations
Timeline
09:05 PM
Mini Incident Report
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support.
(All Times US/Pacific)
Incident Start: 25 April 2024 10:30
Incident End: 25 April 2024 14:00
Duration: 3 hours, 30 minutes
Affected Services and Features:
Google Bigtable - Monitoring metrics
Cloud Spanner - Monitoring metrics
Regions/Zones: US regions
Description:
Google Cloud Monitoring experienced issues with metric data availability impacting metric data for Cloud Bigtable and Cloud Spanner for a duration of approximately 3 hours, 30 minutes. The issue was limited to Cloud Bigtable and Cloud Spanner resources in US regions.
From preliminary analysis, the root cause of the issue is a code change in Google Cloud Monitoring that caused delays in processing metric data and the timely delivery of those metrics to Google Cloud Customers.
The issue was fully mitigated once our engineers rolled back the code change.
Customer Impact:
Customers may have experienced telemetry delays and in some cases metric gaps in their monitoring dashboards for the affected timeline. GCP and third-party services consuming these metrics on Google Cloud Monitoring API may also have experienced metric gaps, including those dependent on metrics for autoscaling.
11:50 PM
After deep dive investigations, Our engineers determined that the impact was limited to BigTable and Cloud Spanner metrics.
From our preliminary analysis, the suspected root cause of the issue is a recent roll out. This was rolled back by our engineers to the last known stable version.
The roll back has mitigated all known issues with Cloud Monitoring as of Thursday, 2024-04-25 12:35 US/Pacific.
We will publish an analysis of this incident once we have completed our internal investigation.
We thank you for your patience while we worked on resolving the issue.
Thank you for choosing us!
10:58 PM
Summary: Google Cloud Monitoring experiencing intermittent issues with metric availability for GCP resources in US regions.
Description: We are experiencing an issue with Cloud Monitoring beginning at Thursday, 2024-04-25 12:30 US/Pacific.
Upon further investigation, we believe the impact is limited to metrics of GCP resources in US regions.
Our engineering team continues to investigate the issue.
We will provide an update by Thursday, 2024-04-25 16:00 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis: - Customers impacted by this issue may experience delays and in some cases may not see the metric data in their monitoring dashboard.
GCP and third party products and services that consume these metrics via Cloud monitoring API may experience these above stated issues.
This includes products that rely on metrics for autoscaling.
Alerting mechanisms reliant on this metric data may experience the issue too.
Workaround: None at this time.
10:19 PM
Summary: Google cloud Monitoring experiencing intermittent issues with metric availability
Description: We are experiencing an issue with Cloud Monitoring beginning at Thursday, 2024-04-25 12:30 US/Pacific.
Our engineering team continues to investigate the issue.
We will provide an update by Thursday, 2024-04-25 15:00 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis: Multiple cloud services experiencing intermittent issues with metric availability
Workaround: None at this time.