Google Cloud MAJOR

Google Cloud infrastructure components faced issues in australia-southeast2 starting 2021-08-23 19:50 PDT. Most of th...

August 24, 2021 · 03:50 AM UTC – 05:20 AM UTC · Duration: 1h 30min

Affected Services

Google Cloud Infrastructure ComponentsGoogle Cloud NetworkingCloud NATGoogle Compute EngineGoogle Kubernetes EnginePersistent DiskGoogle Cloud StorageCloud FilestoreGoogle Cloud DataprocCloud RunGoogle Cloud SQLCloud SpannerGoogle Cloud Pub/SubGoogle Cloud DataflowCloud FirestoreGoogle Cloud BigtableCloud LoggingCloud MonitoringGoogle BigQueryIdentity and Access ManagementGoogle Cloud FunctionsHybrid ConnectivityCloud Load Balancing

Timeline

11:59 PM
We apologize for the inconvenience this service disruption may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 23 August 2021 19:50 Incident End: 23 August 2021 21:20 Duration: 1 hours, 30 minutes Affected Services and Features: Cloud Networking Cloud L7 and L4 Load balancers Cloud Interconnect Cloud NAT Cloud VPN Google Compute Engine Google Kubernetes Engine Cloud Persistent Disk Cloud Cloud Storage Cloud FileStore Cloud Dataproc Cloud Run Cloud SQL Cloud Spanner Cloud Pub/Sub Cloud Dataflow Cloud Dataproc Cloud Firestore Cloud Bigtable Cloud Logging Cloud Monitoring Cloud BigQuery Cloud IAM Regions/Zones: australia-southeast2 Description: Google Cloud Networking experienced intermittent connectivity issues with Google Cloud Services in australia-southeast2 for 1 hour and 30 minutes. The underlying Cloud Networking impact ended at 20:41, however, some Cloud Services took longer to recover - delaying the all clear until 21:20. Any service that uses Cloud Networking may have seen impact. We have included available details of service specific impact below; however, this may not be a comprehensive accounting of all downstream networking impact. From preliminary analysis, the root cause of the issue was transient voltage at the feeder to the network equipment, causing the equipment to reboot. In order to mitigate the issue, traffic within the australia-southeast2 region was redirected temporarily. Customer Impact: Cloud Networking: Public IP traffic connectivity failed from 19:51 to 20:41. Cloud L7 Load Balancers: Partial dataplane query loss and control plane operational delay for External load balancers from 19:50 - 20:12, and Internal load balancers from 19:50 - 20:18. Cloud L4 Load Balancing Inbound public IP traffic was dropped from 19:51 to 20:41. Cloud Interconnect: Up to 100% packet loss between 19:50 and 20:21. Cloud NAT experienced control plane failures from 19:51 to 20:00. Cloud VPN HA dropped up to 83% of traffic between 19:51 and 20:21, while Legacy VPN dropped ~100% of traffic between 19:51 and 20:41. Cloud SQL: 100% error rate from 19:49 to 20:01. Cloud Storage: 100% error rate through 21:20. Cloud Functions: Cloud Run: 100% error rate through 21:19. Cloud Bigtable: 100% error rate from 19:49 to 20:01 and increased latency from 20:01 to 20:13. Cloud Logging: Logs written within the region may have failed to be ingested from 20:07 to 21:18. Cloud Monitoring: Customers may have experienced falsely firing alerts, missed alerts, missing metrics and failed writes from 19:50 to 20:15. Google Compute Engine: Operations to create or modify instances failed from 19:52 to 20:12. Connectivity from instances to other GCP services may be affected until 20:26. Existing instances may have lost network connectivity. Additionally autoscaling had delays or errors collecting input data which may have impacted autoscaling decisions. Google Kubernetes Engine: Control plane operations on regional clusters failed between 19:50 and 20:04. Increased latency from 20:05 to 20:41. 100% of requests to container.googleapis.com failed Persistent Disk: Up to 100% device unavailability between 19:51 and 20:13. Cloud Filestore: Up to 100% error rate from 19:50 to 20:03. Cloud IAM: ~80% error rate from 19:52 to 20:10. Cloud Spanner: 100% error rate between 19:53 and 20:09. Cloud Pub/Sub: Increased error rate and latency of up to 95% between 19:50 and 20:12. Cloud Dataflow: Increased errors starting jobs and making progress on existing jobs between 19:50 and 20:12. Cloud Dataproc: New cluster creation failed from 20:09 until 21:20 Cloud Firestore: Control Plane saw ~90% error rates from 19:50 to 20:03. Data plane so no significant impact.
06:27 AM
Google Cloud infrastructure components faced issues in australia-southeast2. Most of the services are now fully restored. The impact is believed to have started at 2021-08-23 19:50 PST. Products impacted and current status: Cloud FileStore - Service restored. 21:07 PST Cloud Networking - Service restored. 20:41 PST Cloud SQL - Customers might still see errors. Cloud VPN - Service fully restored 20:41 PST Cloud GKE - Service fully restored 20:41 PST Cloud Storage - Service fully restored 22:27 PST Cloud Dataproc - Service fully restored. 21:50:36 PST Cloud Run - Service fully restored Cloud Spanner - Service fully restored Cloud Pub/Sub - Service restored. 20:38 PST Cloud Dataflow - Services fully restored at 20:12 PST Cloud Bigtable - Service restored. 21:36 PST Cloud Memorystore - Service restored. 20:03 PST Cloud Logging - Services fully restored - 21:18 PST Cloud BigQuery - Services fully restored - 21:21 PST. Cloud Identity & Security(Cloud Access Policy) - Service fully restored. Cloud Load balancers- Fully restored- 21:47 PST Cloud Persistent Disk - Service fully restored We apologise for the service disruption caused by this issue.
05:53 AM
We are experiencing an issue with Google Cloud infrastructure components at australia-southeast2. We have started seeing recovery with our services. Products impacted and current status: Cloud FileStore - Customers might see API errors. Cloud Networking - Service restored. 20:41 PST Cloud VPN Cloud GKE - Service fully restored. Cloud Storage - Service fully restored. Cloud Dataproc Cloud Run - Service fully restored Cloud Spanner - Service fully restored Cloud Pub/Sub Cloud Dataflow Cloud Bigtable - Service restored. 21:36 PST Cloud Memorystore - Service restored. 20:03 PST Cloud Logging Cloud BigQuery Cloud Identity & Security(Cloud Access Policy) - Service fully restored. Cloud load balancers- Fully restored Persistent Disk - Service fully restored Our engineering team continues to investigate the issue. We will provide an update by Monday, 2021-08-23 22:15 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Customers impacted by this issue may see connectivity issues on Google Cloud Services. Workaround: None at this time…
04:35 AM
We are experiencing an issue with Google Cloud infrastructure components in australia-southeast2 starting 2021-08-23 19:50 PT. Products impacted: Google Cloud Load Balancers Google Compute Engine - Customers might see failure for creating new VMs. Cloud FileStore - Customers might see API errors. Cloud VPN Cloud GKE Cloud PD Cloud Storage Cloud Dataproc Cloud Run Cloud Spanner Cloud Pub/Sub Cloud Dataflow Cloud Bigtable Our engineering team continues to investigate the issue. We will provide an update by Monday, 2021-08-23 21:45 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Customers impacted by this issue may see connectivity issues on Google Cloud Services. Workaround: None at this time