UK - Server Outage (eu-west-1)

Incident Report for Actionstep

Postmortem

Post-Incident Report – Europe Service Outage (9 June 2025)

Status: Resolved
Impact: Full Service Outage – Europe Region (eu-west-1)

Summary

On 9 June 2025, Actionstep’s Europe-hosted environment experienced a full outage. Customers hosted in the EU region were unable to access the platform from 09:16 to 12:46 BST — a total of 3 hours and 30 minutes.

This report outlines the confirmed cause, the impact, the actions taken during the incident, and the steps being implemented to improve platform resilience.

Incident Overview

What Happened

As the UK business day began, the Europe-hosted environment experienced a rapid surge in demand. The platform’s automated scaling did not react quickly enough, leading to all application servers in the region becoming overwhelmed and failing.

This was a failure within Actionstep’s application infrastructure. It was not related to AWS infrastructure, not a cybersecurity event, and not a data breach. Customer data remained secure and intact throughout.

Root Cause

  • The auto-scaling process was not fast enough to meet the surge in demand.
  • Server startup times were insufficiently responsive, causing the entire server fleet to become overloaded.
  • Once servers failed, the system could not recover without manual intervention.

Impact

  • Service: Complete outage of the EU-hosted platform for 3 hours and 30 minutes.
  • Users: All customers in the EU region were unable to access the platform.
  • Data: No data loss, corruption, or compromise occurred.

Immediate Actions Taken

  • Increased the minimum number of application servers in the EU region.
  • Introduced manual scaling procedures as an interim safeguard.
  • Updated monitoring thresholds and alerts to ensure faster detection of scaling issues.

Next Steps and Improvements

  • Business Day Warm-Up: Automatically scaling servers ahead of the UK business day to handle expected demand spikes.
  • Startup Time Optimisation: Improving server startup speeds to prevent scale lag.
  • Infrastructure Review: Evaluating options for enhanced resilience.

Summary of Key Points

Cause Scaling failure within Actionstep’s EU-hosted environment
Data Safety No data was lost, corrupted, or compromised
Responsibility Actionstep infrastructure failure – fully owned by Actionstep
Immediate Fixes Increased server capacity; improved monitoring; manual safeguards
Next Focus Resilience improvements, scaling reliability, infrastructure review

Our Commitment

We fully acknowledge the severity of this incident and the disruption it caused. Our teams are prioritising the necessary changes to address the technical weaknesses identified and strengthen the reliability of the platform going forward.

Posted Jun 27, 2025 - 04:22 NZST

Resolved

We are communicating to provide further details regarding the performance issues that impacted the UK Server (eu-west-1) platform on 09/06/2025, and to confirm that the issue has now been fully resolved.

The disruption lasted approximately three and a half hours, during which users were unable to access the system.

We want to reassure you that no data was lost during this incident; the issue was limited solely to platform performance.

Our engineering team treated this as their highest priority and worked throughout the day to identify and resolve the problem. A critical fix was deployed at 12:32 PM, which successfully restored the platform to normal performance levels.

We understand the disruption this may have caused and sincerely apologise for the inconvenience. At Actionstep, we are committed to continually improving the platform and the experience of all our users.

A thorough review of the incident is underway to identify the root cause and implement measures to prevent recurrence. We will also be compiling a detailed report of our findings, which we will share once completed.

Thank you for your patience and continued support.

The Actionstep Team
Regards,
Actionstep Support Team
Posted Jun 09, 2025 - 23:32 NZST

Update

Our team continues to work on the issue affecting the UK Server (eu-west-1), and we’re pleased to share that our engineers believe they have identified the underlying problem.

We are currently in the process of ramping up our infrastructure to handle the anticipated volume of incoming requests. This is a critical step to ensure stability and performance once traffic resumes.

As soon as this process is complete, we will begin allowing traffic through and will provide a further update here.

We understand how disruptive this has been and sincerely thank you for your continued patience. In the meantime, if you require assistance, please reach out by submitting a ticket at https://support.actionstep.com.

Regards,
Actionstep Support Team
Posted Jun 09, 2025 - 23:30 NZST

Update

Our team continues to investigate the issue affecting the UK Server (eu-west-1). We are actively working through all possible causes and remain fully committed to restoring full service as soon as possible.

We understand the ongoing impact this may have and truly appreciate your patience as we work toward a resolution. Further updates will be shared as new information becomes available.

If you need assistance in the meantime, please don’t hesitate to contact us via a support ticket at https://support.actionstep.com.

Regards,
Actionstep Support Team
Posted Jun 09, 2025 - 22:55 NZST

Update

Our team is still actively working to resolve the issue affecting the UK Server (eu-west-1). While we don’t have a resolution just yet, please know that this remains our highest priority, and we are dedicating all necessary resources to bring services back online as quickly as possible.

We understand how disruptive this is and sincerely thank you for your continued patience. We will provide another update as soon as we have further information to share.

As always, if you require support in the meantime, please reach out by submitting a ticket at https://support.actionstep.com.

Regards,
Actionstep Support Team
Posted Jun 09, 2025 - 22:25 NZST

Identified

We wanted to provide an update regarding the ongoing issue affecting the UK Server (eu-west-1). Our engineering team is continuing to investigate the root cause of the outage and is actively working on a resolution with the highest priority.

We understand the impact this may be having on your operations and sincerely appreciate your patience. Please be assured that restoring full service remains our top priority, and we will continue to keep you informed of any significant developments or progress.

In the meantime, if you need immediate assistance or have questions, please don’t hesitate to submit a ticket at https://support.actionstep.com, and our support team will be happy to help.
Posted Jun 09, 2025 - 21:57 NZST

Investigating

Our team has identified an issue with the UK Server (eu-west-1) that is resulting in an outage.

Our team is investigating this as a priority and will work to restore the system to normal operations urgently.

We will communicate with you regularly regarding the progress of this work to ensure you are back online as soon as possible.

In the meantime, our support team is available by reach out to us by way of ticket submission at https://support.actionstep.com if you require assistance.


Regards,
Actionstep Support Team
Posted Jun 09, 2025 - 21:04 NZST

Identified

We are writing to inform that our team is aware of an outage affecting the UK - Server Outage (eu-west-1) system that began on 09/06/2025 at 9:10am

Our team is investigating this as the top priority and will work to restore the system to full functionality urgently.

We will communicate with you regularly regarding the progress of this work to ensure you are back online as soon as possible.

In the meantime, our support team is available by way of ticket submission to https://support.actionstep.com if you require assistance.

Regards,
Actionstep Support Team
Posted Jun 09, 2025 - 20:20 NZST
This incident affected: United Kingdom, Middle East, Africa, and Europe.