Date of Incident: Tuesday, April 22, 2025
What Happened?
During our routine software release on Tuesday, an update to the invoice payments list introduced performance degradation, causing slower query execution. This, combined with a high volume of users accessing the invoice payments list (which also functions as a report), led to server overload and timeouts.
The root cause was a surge in long-running queries. While individually these queries might have been manageable under normal load, the increased latency from the update triggered numerous user retries. This resulted in the simultaneous execution of many resource-intensive queries, creating a significant backlog and further compounding the performance issues into a negative feedback loop of retries and slowdowns.
What We Did:
To address the immediate impact and restore service stability, we took the following actions:
What We Are Doing to Prevent Recurrence:
To prevent similar incidents in the future, we are taking the following steps: