Google Cloud MAJOR
Google Cloud infrastructure components faced issues in australia-southeast2 starting 2021-08-23 19:50 PDT. Most of th...
August 24, 2021 · 03:50 AM UTC – 05:20 AM UTC · Duration: 1h 30min
Affected Services
Google Cloud Infrastructure ComponentsGoogle Cloud NetworkingCloud NATGoogle Compute EngineGoogle Kubernetes EnginePersistent DiskGoogle Cloud StorageCloud FilestoreGoogle Cloud DataprocCloud RunGoogle Cloud SQLCloud SpannerGoogle Cloud Pub/SubGoogle Cloud DataflowCloud FirestoreGoogle Cloud BigtableCloud LoggingCloud MonitoringGoogle BigQueryIdentity and Access ManagementGoogle Cloud FunctionsHybrid ConnectivityCloud Load Balancing
Timeline
11:59 PM
We apologize for the inconvenience this service disruption may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support.
(All Times US/Pacific)
Incident Start: 23 August 2021 19:50
Incident End: 23 August 2021 21:20
Duration: 1 hours, 30 minutes
Affected Services and Features:
Cloud Networking
Cloud L7 and L4 Load balancers
Cloud Interconnect
Cloud NAT
Cloud VPN
Google Compute Engine
Google Kubernetes Engine
Cloud Persistent Disk
Cloud Cloud Storage
Cloud FileStore
Cloud Dataproc
Cloud Run
Cloud SQL
Cloud Spanner
Cloud Pub/Sub
Cloud Dataflow
Cloud Dataproc
Cloud Firestore
Cloud Bigtable
Cloud Logging
Cloud Monitoring
Cloud BigQuery
Cloud IAM
Regions/Zones: australia-southeast2
Description:
Google Cloud Networking experienced intermittent connectivity issues with Google Cloud Services in australia-southeast2 for 1 hour and 30 minutes. The underlying Cloud Networking impact ended at 20:41, however, some Cloud Services took longer to recover - delaying the all clear until 21:20. Any service that uses Cloud Networking may have seen impact. We have included available details of service specific impact below; however, this may not be a comprehensive accounting of all downstream networking impact. From preliminary analysis, the root cause of the issue was transient voltage at the feeder to the network equipment, causing the equipment to reboot. In order to mitigate the issue, traffic within the australia-southeast2 region was redirected temporarily.
Customer Impact:
Cloud Networking: Public IP traffic connectivity failed from 19:51 to 20:41.
Cloud L7 Load Balancers: Partial dataplane query loss and control plane operational delay for External load balancers from 19:50 - 20:12, and Internal load balancers from 19:50 - 20:18.
Cloud L4 Load Balancing Inbound public IP traffic was dropped from 19:51 to 20:41.
Cloud Interconnect: Up to 100% packet loss between 19:50 and 20:21.
Cloud NAT experienced control plane failures from 19:51 to 20:00.
Cloud VPN HA dropped up to 83% of traffic between 19:51 and 20:21, while Legacy VPN dropped ~100% of traffic between 19:51 and 20:41.
Cloud SQL: 100% error rate from 19:49 to 20:01.
Cloud Storage: 100% error rate through 21:20.
Cloud Functions:
Cloud Run: 100% error rate through 21:19.
Cloud Bigtable: 100% error rate from 19:49 to 20:01 and increased latency from 20:01 to 20:13.
Cloud Logging: Logs written within the region may have failed to be ingested from 20:07 to 21:18.
Cloud Monitoring: Customers may have experienced falsely firing alerts, missed alerts, missing metrics and failed writes from 19:50 to 20:15.
Google Compute Engine: Operations to create or modify instances failed from 19:52 to 20:12. Connectivity from instances to other GCP services may be affected until 20:26. Existing instances may have lost network connectivity. Additionally autoscaling had delays or errors collecting input data which may have impacted autoscaling decisions.
Google Kubernetes Engine: Control plane operations on regional clusters failed between 19:50 and 20:04. Increased latency from 20:05 to 20:41. 100% of requests to container.googleapis.com failed
Persistent Disk: Up to 100% device unavailability between 19:51 and 20:13.
Cloud Filestore: Up to 100% error rate from 19:50 to 20:03.
Cloud IAM: ~80% error rate from 19:52 to 20:10.
Cloud Spanner: 100% error rate between 19:53 and 20:09.
Cloud Pub/Sub: Increased error rate and latency of up to 95% between 19:50 and 20:12.
Cloud Dataflow: Increased errors starting jobs and making progress on existing jobs between 19:50 and 20:12.
Cloud Dataproc: New cluster creation failed from 20:09 until 21:20
Cloud Firestore: Control Plane saw ~90% error rates from 19:50 to 20:03. Data plane so no significant impact.
06:27 AM
Google Cloud infrastructure components faced issues in australia-southeast2. Most of the services are now fully restored. The impact is believed to have started at 2021-08-23 19:50 PST.
Products impacted and current status:
Cloud FileStore - Service restored. 21:07 PST
Cloud Networking - Service restored. 20:41 PST
Cloud SQL - Customers might still see errors.
Cloud VPN - Service fully restored 20:41 PST
Cloud GKE - Service fully restored 20:41 PST
Cloud Storage - Service fully restored 22:27 PST
Cloud Dataproc - Service fully restored. 21:50:36 PST
Cloud Run - Service fully restored
Cloud Spanner - Service fully restored
Cloud Pub/Sub - Service restored. 20:38 PST
Cloud Dataflow - Services fully restored at 20:12 PST
Cloud Bigtable - Service restored. 21:36 PST
Cloud Memorystore - Service restored. 20:03 PST
Cloud Logging - Services fully restored - 21:18 PST
Cloud BigQuery - Services fully restored - 21:21 PST.
Cloud Identity & Security(Cloud Access Policy) - Service fully restored.
Cloud Load balancers- Fully restored- 21:47 PST
Cloud Persistent Disk - Service fully restored
We apologise for the service disruption caused by this issue.
05:53 AM
We are experiencing an issue with Google Cloud infrastructure components at australia-southeast2. We have started seeing recovery with our services.
Products impacted and current status:
Cloud FileStore - Customers might see API errors.
Cloud Networking - Service restored. 20:41 PST
Cloud VPN
Cloud GKE - Service fully restored.
Cloud Storage - Service fully restored.
Cloud Dataproc
Cloud Run - Service fully restored
Cloud Spanner - Service fully restored
Cloud Pub/Sub
Cloud Dataflow
Cloud Bigtable - Service restored. 21:36 PST
Cloud Memorystore - Service restored. 20:03 PST
Cloud Logging
Cloud BigQuery
Cloud Identity & Security(Cloud Access Policy) - Service fully restored.
Cloud load balancers- Fully restored
Persistent Disk - Service fully restored
Our engineering team continues to investigate the issue.
We will provide an update by Monday, 2021-08-23 22:15 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis:
Customers impacted by this issue may see connectivity issues on Google Cloud Services.
Workaround:
None at this time…
04:35 AM
We are experiencing an issue with Google Cloud infrastructure components in australia-southeast2 starting 2021-08-23 19:50 PT.
Products impacted:
Google Cloud Load Balancers
Google Compute Engine - Customers might see failure for creating new VMs.
Cloud FileStore - Customers might see API errors.
Cloud VPN
Cloud GKE
Cloud PD
Cloud Storage
Cloud Dataproc
Cloud Run
Cloud Spanner
Cloud Pub/Sub
Cloud Dataflow
Cloud Bigtable
Our engineering team continues to investigate the issue.
We will provide an update by Monday, 2021-08-23 21:45 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis:
Customers impacted by this issue may see connectivity issues on Google Cloud Services.
Workaround:
None at this time