Google Cloud MAJOR
Global: Some Cloud Monitoring metric calculations are delayed or missing for multiple Cloud Services
October 13, 2021 · 01:25 PM UTC – 06:50 PM UTC · Duration: 5h 25min
Affected Services
OperationsGoogle Cloud Pub/SubGoogle App EngineGoogle Cloud BigtableCloud Monitoring
Timeline
03:42 PM
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support
(All Times US/Pacific)
Incident Start: 13 October 2021 05:25
Incident End: 13 October 2021 10:50
Duration: 5 hours, 25 minutes
Affected Services and Features:
Cloud Monitoring
Regions/Zones: Global
Description:
Google Cloud Monitoring experienced degraded monitoring data availability globally for a duration of 5 hours and 25 minutes. During the disruption, impacted customers would have experienced missing monitoring metric data and missed monitoring alerts. The root cause was identified as a failed configuration on a standby task. A leadership change caused an unhealthy task to take over. The issue was mitigated by restarting the leader which allowed a healthy task to take back control which restored the service.
Customer Impact:
Missing data for some monitoring metrics
Missed monitoring alerts
07:33 PM
The issue with Cloud Monitoring has been resolved for all affected projects as of Wednesday, 2021-10-13 11:21 US/Pacific.
We thank you for your patience while we worked on resolving the issue.
07:04 PM
Summary: Global: Some Cloud Monitoring metric calculations are delayed or missing for multiple Cloud Services
Description: Mitigation work is currently underway by our engineering team, as we believe we have identified the root cause.
We will provide more information by Wednesday, 2021-10-13 13:00 US/Pacific.
Diagnosis: Affected customers may notice services such as Google Bigtable, Cloud Pub/Sub, and App Engine Flexible not exporting metrics to Cloud Monitoring. Additional services with monitoring data that is aggregated by location may also be impacted. Configured alerts based on affected metrics may get incorrect alert signals based on the alert definition.
Workaround: None at this time.
05:50 PM
Summary: Global: Some Cloud Monitoring metric calculations are delayed or missing for multiple Cloud Services
Description: We are experiencing an issue with Cloud Monitoring beginning at Wednesday, 2021-10-13 05:24 US/Pacific.
Our engineering team continues to investigate the issue.
We will provide an update by Wednesday, 2021-10-13 11:30 US/Pacific with current details.
Diagnosis: Affected customers may notice services such as Google Bigtable, Cloud Pub/Sub, and App Engine Flexible not exporting metrics to Cloud Monitoring.
Workaround: None at this time.
05:45 PM
Summary: Global: Some Cloud Monitoring metric calculations are delayed
Description: We are experiencing an issue with Cloud Monitoring beginning at Wednesday, 2021-10-13 00:00 US/Pacific.
Our engineering team continues to investigate the issue.
We will provide an update by Wednesday, 2021-10-13 10:19 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis: None at this time.
Workaround: None at this time.