Google Cloud CRITICAL

Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experi...

November 16, 2024 · 08:52 AM UTC – 12:14 PM UTC · Duration: 3h 22min

Affected Services

Artifact RegistryCloud Data Loss PreventionCloud Developer ToolsCloud Load BalancingCloud LoggingCloud MemorystoreCloud MonitoringCloud SpannerDataplexGKE fleet managementGoogle BigQueryGoogle Cloud DataflowGoogle Cloud NetworkingGoogle Cloud Pub/SubGoogle Cloud SQLGoogle Kubernetes EngineHealthcare and Life SciencesHybrid ConnectivityIdentity and Access ManagementMemorystore for RedisOperationsPub/Sub LiteVirtual Private Cloud (VPC)

Timeline

01:29 AM
Incident Report Summary On 16 November 2024 at 00:47 US/Pacific, a combination of fiber failures and a network equipment fault led to reduced network capacity between the asia-southeast2 region and other GCP regions. The failures were corrected and minimum required capacity recovered by 02:13 US/Pacific. To our GCP customers whose businesses were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. Root Cause and Impact Google’s global network is designed and built to ensure that any occasional capacity loss events are not noticeable and/or have minimal disruption to customers. We provision several diverse network paths to each region and maintain sufficient capacity buffers based on the measured reliability of capacity in each region. Between 12 November and 16 November, two separate fiber failures occurred near the asia-southeast2 region. These failures temporarily reduced the available network capacity between the asia-southeast2 region and other GCP regions, but did not impact the availability of GCP services in the region. Google engineers were alerted of these failures as soon as they occurred and were working with urgency on remediating these fiber failures but had not yet completed full recovery. On 16 November 2024 at 00:47 US/Pacific, a latent software defect impacted a backbone networking router in the asia-southeast2 region resulting in further reduction of available inter-region capacity and exhausted our reserve network capacity buffers causing multiple Google Cloud services in the region to experience high latency and/or elevated error rates for operations requiring inter-region connectivity. During this time, customers in asia-southeast2, would have experienced issues with managing and monitoring existing resources, creating new resources, and data replication to other regions. To mitigate the impact, Google engineers re-routed Internet traffic away from the asia-southeast2 region to be served from other GCP regions, primarily asia-southeast1 while working in parallel to recover the lost capacity. The faulty backbone networking router was recovered on 16 November 2024 02:13 US/Pacific. This ended the elevated network latency and error rates for most of the impacted GCP services’ operations. Recovery of the first failed fiber was completed on 18 November 08:45 US/Pacific and the second failed fiber was restored at 09:00 US/Pacific on the same day. Remediation and Prevention We’re taking the following actions to reduce the likelihood of recurrence and time to mitigate impact of this type of incident in the future: During the incident, our actions to reroute traffic away from the asia-southeast2 region and recover the faulty backbone networking router took longer than expected as the loss of capacity hindered our visibility of required networking telemetry and functionality of emergency tooling. We’re reviewing these gaps to implement the required improvements to our network observability, emergency tools and incident response playbooks. Work with our fiber partners in the asia-southeast2 region to ensure our fiber paths between facilities in the region and to submarine cable landing stations are on the most reliable routes available, and have adequate preventative maintenance and repair processes.
08:03 PM
Mini Incident Report We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 16 November, 2024 00:52 Incident End: 16 November, 2024 03:36 Duration: 2 hours, 44 minutes Affected Services and Features: Google Cloud Networking Virtual Private Cloud (VPC) Cloud Load Balancing Cloud SQL Cloud Spanner Cloud Logging Cloud Firestore BigQuery Cloud VPN Memory Store for Redis Artifact Registry Cloud Dataflow Cloud Data Loss Prevention Cloud Deploy Cloud Healthcare Dataplex GKE fleet management (GKE Connect) Google Kubernetes Engine (GKE) Identity and Access Management (IAM) Regions/Zones: asia-southeast-2 Description: Multiple Google Cloud services in asia-southeast2 were degraded for 2 hours, 44 minutes. From preliminary analysis, the root cause of the issue was a significant loss of inter-region capacity due to several simultaneous fiber cuts, combined with the malfunction of a backbone networking router which had to be removed from service. Google will complete a full Incident Report in the following days that will provide a full root cause. Customer Impact: Virtual Private Cloud (VPC) - Cross-region and external connectivity issues. Cloud Load Balancing - Increased latency for GCLB requests ingressing in asia-southeast2. Cloud SQL - Elevated latency and error rates for the Cloud SQL Admin API. Cloud Spanner - Customers using asia-southeast2 may have seen higher latency. Cloud Logging - Log ingestion and log routing in asia-southeast2 experienced high latency. Affected customers may have observed issues when doing operations (e.g. creating buckets) for this region. Cloud Firestore - Affected customers may have experienced elevated error rates and latencies for databases in asia-southeast2. BigQuery - Affected customers may have experienced increased latency/errors for import/export jobs and cross-region copy. The impact was mitigated at 02:30 US/Pacific. Cloud VPN - Affected customers may have observed partial packet loss in asia-southeast2. Memory Store for Redis - A small number of customers in asia-southeast2 may have experienced elevated errors when creating instances from 00:35 to 01:10 US/Pacific. Artifact Registry - Affected customers may have observed API timeouts or server errors. Cloud Dataflow - Affected customers may have experienced slow Dataflow jobs or job creation failures. Cloud Data Loss Prevention - Affected customers may have observed server unavailable errors. Cloud Deploy - Affected customers may have encountered deployment failures. Cloud Interconnect - Affected customers may have observed partial packet loss in asia-southeast2. Cloud Healthcare - Affected customers may have observed API timeouts or server errors. Dataplex - Affected customers may have observed API timeouts or server errors. GKE fleet management (GKE Connect) - Affected customers may have observed API timeouts or server errors when sending requests to clusters registered to asia-southeast2. Google Kubernetes Engine (GKE) - Some customers may have experienced control plane availability issues and/or GKE Cluster or Node Pool creation or deletion failures. Identity and Access Management (IAM) - Affected customers may have observed increased latency and timeouts for IAM Control Plane operations on global resources and retry traffic for approximately 90 minutes.
12:14 PM
The issue with Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Hybrid Connectivity, Cloud Logging, Cloud Healthcare, GKE fleet management, Dataplex, Artifact Registry, Google Kubernetes Engine, Cloud Data Loss Prevention, Google Cloud Dataflow, Cloud Spanner, Memorystore for Redis, Cloud Monitoring, Pub/Sub Lite, Identity and Access Management, Google BigQuery has been resolved for all affected users as of Saturday, 2024-11-16 03:50 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue.
12:09 PM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing issues with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention, Cloud Deploy etc in asia-southeast2 region. We believe that issues with many impacted GCP products have been mitigated. Our engineering team continues to validate this. We will provide an update by Saturday, 2024-11-16 04:30 US/Pacific with current details. Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues. Cloud Load Balancing: Increased latency for GCLB requests ingressing in asia-southeast2. Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API. Cloud Spanner: Customers using asia-southeast2 would have seen higher latency. Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud Dataflow: Dataflow jobs are slow or failed to create. Cloud Data Loss Prevention: Users may observe server unavailable errors Cloud Firestore: Users experienced elevated error rates and latencies for databases in asia-southeast2. Cloud VPN: Impacted users may observe partial packet loss in the impacted region Memorystore for Redis: A small number of customers in the impacted region (asia-southeast2) may have experienced elevated errors when creating instances from 00:35 to 01:10. The issue has been mitigated as of 2:45 US/ Pacific on 16 Nov, 2024 Artifact Registry: Artifact Registry users may observe API time outs or server errors. BigQuery: Users would have experienced increased latency/errors for import/export jobs and cross-region copy. The impact was mitigated at 02:30 US/Pacific on 16 November 2024 Workaround: None at this time
12:02 PM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing issues with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention, Cloud Deploy etc in asia-southeast2 region. We believe that issues with multiple GCP products have been mitigated. Our engineering team is currtently validating this. We will provide an update by Saturday, 2024-11-16 04:30 US/Pacific with current details. Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues. Cloud Load Balancing: Increased latency for GCLB requests ingressing in asia-southeast2. Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API. Cloud Spanner: Customers using asia-southeast2 would have seen higher latency. Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud Dataflow: Dataflow jobs are slow or failed to create. Cloud Data Loss Prevention: Users may observe server unavailable errors Cloud Firestore: Users experienced elevated error rates and latencies for databases in asia-southeast2. Cloud VPN: Impacted users may observe partial packet loss in the impacted region Memorystore for Redis: A small number of customers in the impacted region (asia-southeast2) may have experienced elevated errors when creating instances from 00:35 to 01:10. The issue has been mitigated as of 2:45 US/ Pacific on 16 Nov, 2024 Artifact Registry: Artifact Registry users may observe API time outs or server errors. BigQuery: Users would have experienced increased latency/errors for import/export jobs and cross-region copy. The impact was mitigated at 02:30 US/Pacific on 16 November 2024 Workaround: None at this time
11:40 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We were experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention etc in asia-southeast2 region. Note that the issues with Virtual Private Cloud (VPC), Cloud Dataflow, Cloud Spanner, Cloud Load Balancing, Cloud Logging, Cloud VPN, Memorystore for Redis have been mitigated. Our engineering team continues to investigate the issues with remaining products impacted in an effort to resolve the issues for all the customers. We will provide an update by Saturday, 2024-11-16 04:30 US/Pacific with current details Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues. Cloud Load Balancing: Increased latency for GCLB requests ingressing in asia-southeast2. Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API. Cloud Spanner: Customers using asia-southeast2 would have seen higher latency. Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud Dataflow: Dataflow jobs are slow or failed to create. Cloud Data Loss Prevention: Users may observe server unavailable errors Cloud Firestore: Users experiencing elevated error rates and latencies for databases in asia-southeast2. Cloud VPN: Impacted users may observe partial packet loss in the impacted region Memorystore for Redis: A small number of customers in the impacted region (asia-southeast2) may have experienced elevated errors when creating instances from 00:35 to 01:10. The issue has been mitigated as of 2:45 US/ Pacific on 16 Nov, 2024 Artifact Registry: Artifact Registry users may observe API time outs or server errors. Workaround: None at this time.
11:12 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention etc in asia-southeast2 region Our engineering team continues to investigate the issue in an effort to resolve the same for all the customers. We will provide an update by Saturday, 2024-11-16 03:45 US/Pacific with current details. Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues. Cloud Load Balancing: Increased latency for GCLB requests ingressing in asia-southeast2. Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API. Cloud Spanner: Customers using asia-southeast2 would have seen higher latency. Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud Dataflow: Dataflow jobs are slow or failed to create. Cloud Data Loss Prevention: Users may observe server unavailable errors Cloud Firestore: Users experiencing elevated error rates for database get, delete, and creation operations in asia-southeast2. Workaround: None at this time.
11:07 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention etc in asia-southeast2 region Our engineering team continues to investigate the issue. We will provide an update by Saturday, 2024-11-16 03:45 US/Pacific with current details. Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues. Cloud Load Balancing: Increased latency for GCLB requests ingressing in asia-southeast2. Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API. Cloud Spanner: Customers using asia-southeast2 would have seen higher latency. Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud Dataflow: Dataflow jobs are slow or failed to create. Cloud Data Loss Prevention: Users may observe server unavailable errors Cloud Firestore: Users experiencing elevated error rates for database get, delete, and creation operations in asia-southeast2. Workaround: None at this time.
11:01 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention etc in asia-southeast2 region Our engineering team continues to investigate the issue. We will provide an update by Saturday, 2024-11-16 03:30 US/Pacific with current details. Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues. Cloud Load Balancing: Increased latency for GCLB requests ingressing in asia-southeast2. Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API. Cloud Spanner: Customers using asia-southeast2 would have seen higher latency. Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud Dataflow: Dataflow jobs are slow or failed to create. Cloud Data Loss Prevention: Users may observe server unavailable errors Cloud Firestore: Users experiencing elevated error rates for database get, delete, and creation operations in asia-southeast2. Workaround: None at this time.
10:51 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention etc in asia-southeast2 region Our engineering team continues to investigate the issue. We will provide an update by Saturday, 2024-11-16 03:30 US/Pacific with current details. Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues Cloud Load Balancing: Increased latency for GCLB requests ingressing in asia-southeast2 Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API Cloud Spanner: Customer using asia-southeast2 would have seen higher latency Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud Dataflow: Dataflow jobs are slow or failed to create. Cloud Data Loss Prevention: Users may observe server unavailable errors Workaround: None at this time.
10:43 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention etc in asia-southeast2 region Our engineering team continues to investigate the issue. We will provide an update by Saturday, 2024-11-16 03:15 US/Pacific with current details. Diagnosis: Customers impacted by this issue may see the following symptoms: Virtual Private Cloud: Cross-region and external connectivity issues in asia-southeast2 Cloud Load Balance: Increased latency for GCLB requests ingressing in asia-southeast2 Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API Cloud Spanner: Customer using asia-southeast2 would have seen higher latency Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Cloud dataflow: Dataflow jobs are slow or failed to create. Workaround: None at this time.
10:39 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN, Hybrid Connectivity, Cloud Healthcare, Dataplex, GKE fleet management, Artifact Registry, Google Kubernetes Engine (GKE), Identity and Access Management Cloud Data Loss Prevention etc in asia-southeast2 region Our engineering team continues to investigate the issue. We will provide an update by Saturday, 2024-11-16 03:15 US/Pacific with current details. Diagnosis: Cloud Load Balance: Increased latency for GCLB requests ingressing in asia-southeast2 Cloud SQL: Elevated latency and error rate for the Cloud SQL Admin API Cloud Spanner: Customer using asia-southeast2 would have seen higher latency Cloud Logging: Log ingestion and log routing in asia-southeast2 sees high latency.The users might also observe issues when doing operations (e.g. creating bucket) for this region. Workaround: None at this time.
10:28 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud Logging, Cloud Firestore, BigQuery, Cloud VPN etc in asia-southeast2 region Our engineering team continues to investigate the issue. We will provide an update by Saturday, 2024-11-16 03:00 US/Pacific with current details. Diagnosis: Increased latency for GCLB requests ingressing in asia-southeast2 Workaround: None at this time.
10:13 AM
Summary: Multiple GCP Products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Load Balancing are experiencing issues in the asia-southeast2 region Description: We are experiencing an issue with Multiple GCP products including Google Cloud Networking, Virtual Private Cloud (VPC), Cloud Balancing etc. in asia-southeast2 region Our engineering team continues to investigate the issue. We will provide an update by Saturday, 2024-11-16 03:00 US/Pacific with current details. Diagnosis: None at this time. Workaround: None at this time.