Google Cloud CRITICAL

Global: Vertex AI Online Prediction Is Experiencing Increased Error Rates

June 2, 2022 · 06:10 PM UTC – 10:30 PM UTC · Duration: 4h 20min

Affected Services

Vertex AI Online PredictionCloud Machine Learning

Timeline

09:06 PM
Mini Incident Report We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case https://cloud.google.com/support or help article https://support.google.com/a/answer/1047213. (All Times US/Pacific) Incident Start: 02 June 2022 10:10 US/Pacific Incident End: 02 June 2022 14:30 US/Pacific Duration: 4 hours, 20 minutes Affected Services and Features: Vertex AI Online Prediction Regions/Zones: Global Description: Vertex AI Online Prediction experienced increased error rates from 30% up to 100% per region depending on user usage patterns for a duration of 4 hours, 20 minutes. From preliminary analysis, the root cause of the issue was that Vertex Prediction Endpoints were globally marked as deleted due to faulty resource cleanup process. The service fully recovered when the Vertex Prediction Endpoints were restored. Customer Impact: Affected customers may have experienced: All pre-existing Vertex models undeployed on Vertex AI Endpoints Empty responses when listing the deployed models Runtime exceptions and general errors on Predict and Explain requests Quota failure when trying to re-deploy models
11:24 PM
The issue with Vertex AI Online Prediction has been resolved for all affected users as of Thursday, 2022-06-02 15:21 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue.
10:51 PM
Summary: Global: Vertex AI Online Prediction Is Experiencing Increased Error Rates Description: We believe the issue with Vertex AI Online Prediction is partially resolved. We do not have an ETA for full resolution at this point. We will provide an update by Thursday, 2022-06-02 16:10 US/Pacific with current details. Diagnosis: For affected customers: When listing the deployed models in Endpoints, the list will be empty and Predict and Explain requests would fail. Workaround: None at this time.
10:50 PM
Summary: Global: Vertex AI Online Prediction Is Experiencing Increased Error Rates Description: Mitigation work is currently underway by our engineering team. The mitigation is expected to complete by Thursday, 2022-06-02 15:07 US/Pacific. We will provide more information by Thursday, 2022-06-02 15:07 US/Pacific. Diagnosis: For affected customers: When listing the deployed models in Endpoints, the list will be empty and Predict and Explain requests would fail. Workaround: None at this time.
10:06 PM
Summary: Global: Vertex AI Online Prediction Is Experiencing Increased Error Rates Description: Mitigation work is currently underway by our engineering team. The mitigation is expected to complete by Thursday, 2022-06-02 15:07 US/Pacific. We will provide more information by Thursday, 2022-06-02 15:07 US/Pacific. Diagnosis: Customers will experiences increased error rates. Workaround: None at this time.
09:58 PM
Summary: Global: Vertex AI Online Prediction Is Experiencing Increased Error Rates Description: We are experiencing an issue with Vertex AI Online Prediction beginning at Thursday, 2022-06-02 10:20 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2022-06-02 14:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Customers will experiences errors Workaround: None at this time.