Google Cloud MINOR
Vertex AI training jobs are experiencing issues where jobs may take longer than usual
July 23, 2023 · 02:30 AM UTC – 10:18 AM UTC · Duration: 7h 48min
Affected Services
Vertex AI TrainingCloud Machine Learning
Timeline
10:18 AM
The issue with Vertex AI Training has been resolved for all affected users as of Sunday, 2023-07-23 02:09 US/Pacific.
We thank you for your patience while we worked on resolving the issue.
04:02 AM
Summary: Vertex AI training jobs are experiencing issues where jobs may take longer than usual
Description: Mitigation work is currently underway by our engineering team.
The mitigation is expected to complete by Sunday, 2023-07-23 02:00 US/Pacific.
We will provide more information by Sunday, 2023-07-23 02:30 US/Pacific.
Diagnosis: Vertex AI training jobs are experiencing issues where jobs may take longer than usual . Affected customers may see increase in errors "Container <container_name> was reset due to preemption"
Workaround: None at this time.
03:14 AM
Summary: Vertex AI training jobs are experiencing issues where jobs may take longer than usual
Description: We are experiencing an issue with Vertex AI Training.
Our engineering team investigated the issue, identified a mitigation, and are working to start the mitigation process.
We will provide an update by Saturday, 2023-07-22 20:30 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis: Vertex AI training jobs are experiencing issues where jobs may take longer than usual . Affected customers may see increase in errors "Container <container_name> was reset due to preemption"
Workaround: None at this time.