Google Cloud CRITICAL
Apigee Integrated Developer Portal is experiencing issues while accessing it from Apigee Edge and Apigee X
September 11, 2023 · 12:00 PM UTC – 08:25 PM UTC · Duration: 8h 25min
Affected Services
Apigee
Timeline
09:38 PM
Incident Report
Summary
On Monday, 11 September 2023, Apigee integrated developer portal customers experienced elevated 5xx responses, issues authenticating to API, and issues updating portals when accessing it via Apigee Edge or Apigee X for a duration of 8 hours and 25 minutes.
To our Apigee customers whose business analytics were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.
Root Cause
The root cause of the issue was an incorrect configuration change made to one of the Apigee load balancers for developer portals. This configuration change caused all the requests that were routed through the affected Apigee load balancer to fail with a default 404 error code.
Remediation and Prevention
Google engineers were alerted by a support case on Monday, 11 September at 07:00 US/Pacific and immediately started an investigation. They were able to identify a misconfiguration in one of the load balancers, which was caused by a change aimed at improving the load balancer’s performance. Upon identifying the misconfiguration, Google Engineers quickly regenerated the configurations of the affected Apigee load balancer and ensured that both load balancers were properly configured.
We apologize for the length and severity of this incident. We are taking the following steps to prevent a recurrence and improve reliability in the future:
Identify the cause of the inconsistent configurations for the developer portal load balancers
Improve the change management process for Apigee load balancer changes to prevent similar issues
Improve monitoring and visibility to ensure the incorrect configurations are identified proactively
Detailed Description of Impact
On 11 September from 04:00 to 12:25 US/Pacific, users experienced elevated error rates with errors like "Failed to load users", "Failed to load zone" or "Service Unavailable”, when interacting with the developer portal. Customers also experienced issues authenticating to API portals or updating their portals.
11:31 PM
Mini Incident Report
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support .
(All Times US/Pacific)
Incident Start: 11 September 2023 04:00
Incident End: 11 September 2023 12:25
Duration: 8 hours, 25 minutes
Affected Services and Features:
Apigee Edge and Apigee X
Regions/Zones: Global
Description:
Apigee integrated developer portal customers experienced elevated 5xx responses, issues authenticating to API, and updating portals while accessing it from Apigee Edge or Apigee X globally for a duration of 8 hours and 25 minutes due to misconfiguration of internal Apigee load balancers. Google will complete a full IR in the following days that will provide a detailed root cause.
Customer Impact:
Customers experienced elevated 5xx responses.
Customers experience issues authenticating to API portals.
Customers experienced errors like "Failed to load users", "Failed to load zone" or "Service Unavailable” when interacting with the developer portal.
Customers may have also experienced issues updating their portals.
08:45 PM
The issue with Apigee Integrated Developer Portal has been resolved for all affected users as of Monday, 2023-09-11 12:25 US/Pacific.
We thank you for your patience while we worked on resolving the issue.
08:03 PM
Summary: Apigee Integrated Developer Portal is experiencing issues while accessing it from Apigee Edge and Apigee X
Description: Our engineering team continues to investigate the root cause of the issue.
We will provide an update by Monday, 2023-09-11 13:08 US/Pacific with current details.
Diagnosis:
Users may experience errors like "Failed to load users", "Failed to load zone" or "Service Unavailable when interacting with the developer portal.
Customers may also experience issues updating their portals.
Workaround: None at this time.
07:11 PM
Summary: Global: Apigee/ ApigeeX API Portal Elevated 5XX Errors
Description: Our engineering team has determined that further investigation is required to mitigate the issue.
We will provide an update by Monday, 2023-09-11 12:08 US/Pacific with current details.
Diagnosis:
Customers are experiencing elevated 5xx responses and issues authenticating to the API Portal.
Customers may also experience issues updating their portals.
Workaround: None at this time.
06:30 PM
Summary: Global: Apigee/ ApigeeX API Portal Elevated 5XX Errors
Description: Mitigation work is currently underway by our engineering team.
We do not have an ETA for mitigation at this point.
We will provide more information by Monday, 2023-09-11 11:30 US/Pacific.
Diagnosis: Customers are experiencing elevated 5xx responses in API Portal.
Workaround: None at this time.
06:07 PM
Summary: Global: Apigee/ ApigeeX API Portal Elevated 5XX Errors
Description: We are experiencing an issue with Apigee/ ApigeeX API Portal beginning at Monday, 2023-09-11 05:00 US/Pacific.
Our engineering team continues to investigate the issue.
We will provide an update by Monday, 2023-09-11 10:38 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis: Customers are experiencing elevated 5xx responses in API Portal.
Workaround: None at this time.