Google Cloud MINOR

Vertex AI training jobs are experiencing issues where jobs may take longer than usual

July 23, 2023 · 02:30 AM UTC – 10:18 AM UTC · Duration: 7h 48min

Affected Services

Vertex AI TrainingCloud Machine Learning

Timeline

10:18 AM
The issue with Vertex AI Training has been resolved for all affected users as of Sunday, 2023-07-23 02:09 US/Pacific. We thank you for your patience while we worked on resolving the issue.
04:02 AM
Summary: Vertex AI training jobs are experiencing issues where jobs may take longer than usual  Description: Mitigation work is currently underway by our engineering team. The mitigation is expected to complete by Sunday, 2023-07-23 02:00 US/Pacific. We will provide more information by Sunday, 2023-07-23 02:30 US/Pacific. Diagnosis: Vertex AI training jobs are experiencing issues where jobs may take longer than usual . Affected customers may see increase in errors "Container <container_name> was reset due to preemption" Workaround: None at this time.
03:14 AM
Summary: Vertex AI training jobs are experiencing issues where jobs may take longer than usual  Description: We are experiencing an issue with Vertex AI Training.  Our engineering team investigated the issue, identified a mitigation, and are working to start the mitigation process.  We will provide an update by Saturday, 2023-07-22 20:30 US/Pacific with current details.  We apologize to all who are affected by the disruption. Diagnosis: Vertex AI training jobs are experiencing issues where jobs may take longer than usual . Affected customers may see increase in errors "Container <container_name> was reset due to preemption" Workaround: None at this time.