Google Cloud MAJOR

Multiple GCP products affected by Increased error rates for Google Cloud APIs in us-central1

August 3, 2023 · 10:50 PM UTC – 01:03 AM UTC · Duration: 2h 13min

Affected Services

Google Compute EngineDataproc MetastoreGoogle Cloud NetworkingGoogle Cloud ConsoleCloud Data FusionGoogle Cloud Dataproc

Timeline

08:18 PM
Mini Incident Report We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support . (All Times US/Pacific) Incident Start: 03 August 2023 14:50 Incident End: 03 August 2023 17:03 Duration: 2 hours, 13 minutes Affected Services and Features: Cloud Data Fusion, Dataproc Metastore, Google Cloud Console, Google Cloud Networking, Google Compute Engine, Google Cloud Dataproc Regions/Zones: us-central1, us-south1, us-east1, us-west3 Description: Cloud Data Fusion, Dataproc Metastore, Google Cloud Console, Google Cloud Networking, Google Compute Engine experienced elevated error rates for Google Cloud APIs in us-central1 for a duration of 2 hours and 13 minutes. From preliminary analysis, the root cause of the issue is due to increase in traffic resulting in overloaded jobs with limited capacity. Google engineers mitigated the issue by increasing the capacity and redirecting the traffic. Customer Impact: Customers may have received errors for various API requests made in us-central1 region (regional requests only) and also elevated errors for AggregatedList API calls Cloud Load Balancer customers may have received 502 errors when creating/updating load balancers in us-central1. Customers may not have been able to create/delete instances of Cloud Data Fusion in us-central1 GCE users would have experienced elevated error rates in instance creation and in deletion of regional instances groups in us-central1 Dataproc Metastore instance creations may have failed in multiple US region Elevated error rates for GKE cluster operations Customers may have received errors or may have been unable to load pages in Cloud Console
01:28 AM
The issue with Cloud Data Fusion, Dataproc Metastore, Google Cloud Console, Google Cloud Networking, Google Compute Engine has been resolved for all affected projects as of Thursday, 2023-08-03 17:27 US/Pacific. We thank you for your patience while we worked on resolving the issue.
01:18 AM
Summary: Multiple GCP products affected by Increased error rates for Google Cloud APIs in us-central1 Description: At this time error rates appear to have returned to normal levels. Engineers are continuing to monitor to confirm full recovery. We do not have an ETA for full mitigation at this point. We will provide more information by Thursday, 2023-08-03 18:00 US/Pacific. Diagnosis: Cloud Load Balancer customers may receive 502 errors when creating/updating load balancers in us-central1. Increased error rates for Cloud Data Fusion instance creation in us-central1 -Dataproc Metastore creations may be failing in us-central1, us-south1, and us-east1 -Increased error rates for GKE cluster operations -Customers may receive errors or may be unable to load pages in Cloud Console Workaround: None at this time.
01:07 AM
Summary: Multiple GCP products affected by Increased error rates for Google Cloud APIs in us-central1 Description: Mitigation work is currently underway by our engineering team and error rates appear to be decreasing. We do not have an ETA for full mitigation at this point. We will provide more information by Thursday, 2023-08-03 18:00 US/Pacific. Diagnosis: Cloud Load Balancer customers may receive 502 errors when creating/updating load balancers in us-central1. Increased error rates for GCE instance creation in us-central1 -Dataproc Metastore creations may be failing in us-central1, us-south1, and us-east1 -Increased error rates for GKE cluster operations -Customers may receive errors or may be unable to load pages in Cloud Console Workaround: None at this time.
01:06 AM
Summary: Multiple GCP products affected by Increased error rates for Google Cloud APIs in us-central1 Description: Mitigation work is currently underway by our engineering team and error rates appear to be decreasing. We do not have an ETA for full mitigation at this point. We will provide more information by Friday, 2023-08-04 02:41 US/Pacific. Diagnosis: Cloud Load Balancer customers may receive 502 errors when creating/updating load balancers in us-central1. Increased error rates for GCE instance creation in us-central1 -Dataproc Metastore creations may be failing in us-central1, us-south1, and us-east1 -Increased error rates for GKE cluster operations -Customers may receive errors or may be unable to load pages in Cloud Console Workaround: None at this time.
01:01 AM
Summary: Multiple GCP products affected by Increased error rates for Google Cloud APIs in us-central1 Description: We are experiencing an issue with Cloud Data Fusion, Dataproc Metastore, Google Cloud Networking, Google Compute Engine. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2023-08-03 18:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Cloud Load Balancer customers may receive 502 errors when creating/updating load balancers in us-central1. Increased error rates for GCE instance creation in us-central1 -Dataproc Metastore creations may be failing in us-central1, us-south1, and us-east1 -Increased error rates for GKE cluster operations -Customers may receive errors or may be unable to load pages in Cloud Console Workaround: None at this time.
12:55 AM
Summary: Multiple GCP products affected by Increased error rates for Google Cloud APIs in us-central1 Description: We are experiencing an issue with Cloud Data Fusion, Dataproc Metastore, Google Cloud Networking, Google Compute Engine. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2023-08-03 18:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Cloud Load Balancer customers may receive 502 errors when creating/updating load balancers in us-central1. Increased error rates for GCE instance creation in us-central1 -Dataproc Metastore creations may be failing in us-central1, us-south1, and us-east1 -Increased error rates for GKE cluster operations Workaround: None at this time.
12:47 AM
Summary: Increased error rates for Google Cloud APIs in us-central1 Description: We are experiencing an issue with Cloud Data Fusion, Dataproc Metastore, Google Cloud Networking, Google Compute Engine. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2023-08-03 20:41 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Customers may experience increased rates of internal errors when calling Google Cloud APIs in us-central1, including when attempting to make changes to load balancers in the affected region. Workaround: None at this time.
12:16 AM
Summary: Increased error rates for Google Cloud APIs in us-central1 Description: We are experiencing an issue with Google Cloud Networking, Google Compute Engine. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2023-08-03 17:15 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Customers may experience increased rates of internal errors when calling Google Cloud APIs in us-central1, including when attempting to make changes to load balancers in the affected region. Workaround: None at this time.
11:52 PM
Summary: Increased error rates for GCE APIs Description: We are experiencing an issue with Google Compute Engine API. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2023-08-03 17:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Customers may experience increased rates of internal errors when calling global Compute Engine APIs. Workaround: None at this time.