Google Cloud MAJOR

us multiregion: Elevated errors on Cloud KMS requests.

March 31, 2022 · 08:15 PM UTC – 11:13 PM UTC · Duration: 2h 58min

Affected Services

Cloud Key Management Service

Timeline

07:49 PM
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support or help article https://support.google.com/a/answer/1047213. (All Times US/Pacific) Incident Start: 2022-03-31 12:15 Incident End: 2022-03-31 15:13 Duration: 2 hours, 58 minutes Affected Services and Features: Google Cloud Key Management Service (KMS) Google Cloud Storage (GCS) Regions/Zones: Multiregional US Description: A Cloud Key Management Service (KMS) job experienced multiple errors due to task crashes in one metro of the US multiregion for 2 Hours 58 Minutes. From the preliminary analysis, the root cause of was identified as a map-reduce-style batch job with a huge fast ramp-up of ReadObjects to Google Cloud Storage (GCS), which overloaded the KMS jobs (that are a dependency of GCS). Customer Impact: The affected customers observed errors in Google Cloud Storage for one project. Multiple tasks failed with Memory-Exceed Error. There were a tiny non-zero amount of errors for some projects.
11:14 PM
The issue with Cloud Key Management Service has been resolved for all affected users as of Thursday, 2022-03-31 15:13 US/Pacific. We thank you for your patience while we worked on resolving the issue.
10:58 PM
Summary: us multiregion: Elevated errors on Cloud KMS requests. Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Thursday, 2022-03-31 16:30 US/Pacific. Diagnosis: Affected customers are seeing elevated errors on Cloud KMS requests in the us multiregion. Workaround: None at this time.
10:17 PM
Summary: us multiregion: Elevated errors on Cloud KMS requests. Description: We are experiencing an issue with Cloud Key Management Service beginning at Thursday, 2022-03-31 12:15 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2022-03-31 15:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Affected customers are seeing elevated errors on Cloud KMS requests in the us multiregion. Workaround: None at this time.