Google Cloud CRITICAL

Issues opening cases

November 15, 2024 · 05:57 AM UTC – 12:09 PM UTC · Duration: 6h 12min

Affected Services

Google Cloud Support

Timeline

12:13 PM
Incident Report Summary On 14 November 2024 at 21:57 US/Pacific, our support ticketing system that handles Google Cloud, Billing, and Workspace customer support, experienced an unexpected issue during a vendor-planned maintenance event, causing the system to become unavailable. Throughout the incident duration of 6 hours and 12 minutes, customers were unable to update existing chat, portal or email cases. Customers who attempted to create a support case were presented with our backup contact method and were able to receive support through this method which remained available throughout the outage. Root Cause The outage was triggered by a vendor-initiated change that impacted the performance of our support ticket persistence layer. This update inadvertently caused unavailability, specifically to the query subsystem of our support case management tool. After this configuration change was applied, the subsystem became unresponsive, preventing the processing of any read or write commands. As a result, customers and Google Support were unable to access or update support ticket data, leading to service disruption. Remediation and Prevention Our monitoring systems detected elevated error rates and, at 22:09 US/Pacific, alerted our engineering team, who immediately started an investigation with the vendor. The vendor's incident team concluded that the query subsystem state would not be resolved by a configuration rollback. The vendor’s engineering team prepared a new update, validated it in a test environment, and applied the update to production, returning the system to service on 15 November 2024 at 04:09 US/Pacific. We are taking immediate steps with the vendor to prevent a recurrence and improve reliability in the future: A production change freeze for the vendor's query subsystem is in place until rollout safeguards are sufficient to prevent further impact. We are working with the vendor to improve their change management process to ensure safer rollouts that avoid unexpected issues while also providing earlier detection of rollout change. We will perform a review of rollback safety for configuration changes with the vendor to ensure rollback is always possible, reducing recovery time. Detailed Description of Impact Starting on 14 November 2024 at 21:57 US/Pacific, Customers observed increased latency and required multiple attempts when opening support cases. Customers were able to use the backup case creation process to receive support. Customers were able to send and receive updates to existing support cases by email, but were not able to update cases using the support portal. Support agents were able to send update emails for cases, create pro-active bugs and fill Contact-Us-Forms (CUFs) on behalf of their customers. However, responding via the support portal was unavailable. Customers with active chat support cases were unable to continue their conversation. Error messages received by customers included options for continuing support via the Contact-Us-Forms (CUFs). All contractual obligations for support requests submitted through the Contact-Us-Forms (CUFs) were fulfilled.
10:31 PM
Mini Incident Report We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 14 November, 2024 21:59 Incident End: 15 November, 2024 04:09 Duration: 6 hours, 10 mins Affected Services and Features: Google Cloud Support and Google Workspace Support Regions/Zones: Global Description: Google's Ticketing system, supporting Google Cloud, Billing, and Workspace, experienced an unplanned maintenance event, causing the system to become unavailable. Throughout the 6 hours and 10 minutes, customers were unable to update existing chat, portal, or email cases. Customers who attempted to create a support case were presented with our backup contact method and were able to receive support through the backup method. Preliminary analysis finds that a planned change to Google’s support ticket persistence layer caused the primary system for creating and updating support sessions to return errors. Customers attempting to create a support case were directed to retry, then were provided access to our backup support channel, which worked throughout the outage. No other Google Cloud services were impacted. The engineering on-call team rolled back the change to the persistence layer, restoring service. We sincerely apologize to our Google Cloud customers for this recent service disruption. Customer Impact: Customers saw increased latency and required multiple attempts when opening support cases. Customers were able to use the backup case creation method to receive support. Customers were able to send updates to existing support cases by email, but may not have been able to update using the support portal. Support agents were unable to respond to already-existing cases. Customers with active chat support cases were unable to continue their conversation.
01:07 PM
The issue with Google Cloud Support has been resolved for all affected users as of Friday, 2024-11-15 04:09 US/Pacific. We thank you for your patience while we worked on resolving the issue.
12:46 PM
Summary: Issues opening cases Description: Mitigation work is still underway by our engineering team. Current data indicates that the current mitigation strategy is working and the team continues to see positive results. We will provide more information by Friday, 2024-11-15 05:30 US/Pacific. Diagnosis: Customers will have issues opening cases. Workaround: Customers can use the Contact Us Form (CUF) which will be automatically generated when following the normal case creation process to create cases as a backup mechanism.
10:45 AM
Summary: Issues opening cases Description: We are experiencing an issue with Google Cloud Support. Our engineering team continues to investigate the issue. We will provide an update by Friday, 2024-11-15 05:00 US/Pacific with current details. Diagnosis: Customers will have issues opening cases. Workaround: Customers can use the Contact Us Form (CUF) which will be automatically generated when following the normal case creation process to create cases as a backup mechanism.
08:19 AM
Summary: Issues opening cases Description: We are experiencing an issue with Google Cloud Support. Our engineering team continues to investigate the issue. We will provide an update by Friday, 2024-11-15 03:00 US/Pacific with current details. Diagnosis: Customers will have issues opening cases. Workaround: Customers can use the Contact Us Form (CUF) which will be automatically generated when following the normal case creation process to create cases as a backup mechanism.
07:15 AM
Summary: Issues opening cases Description: We are experiencing an issue with Google Cloud Support. Our engineering team continues to investigate the issue. We will provide an update by Friday, 2024-11-15 01:00 US/Pacific with current details. Diagnosis: Customers will have issues opening cases. Workaround: Customers can use the Contact Us Form (CUF) which will be automatically generated when following the normal case creation process to create cases as a backup mechanism.
06:45 AM
Summary: Issues opening cases Description: We are experiencing an issue with Google Cloud Support. Our engineering team continues to investigate the issue. We will provide an update by Friday, 2024-11-15 01:00 US/Pacific with current details. Diagnosis: Customers will have issues opening cases. Workaround: Customers can use the CUF to create cases as a backup mechanism.