Access Issues
Incident Report for Aventri
Postmortem
Report Date: 2017/09/27

On September 25, 2017 an issue was reported to our Support desk regarding general system slowness and inaccessibility due to degraded performance.  The following is the incident report for the etouches issue related to "Registration attendee performance issues for Max Event Totals" that occurred for 2 hours during September 25, 2017. We understand this service issue has impacted our valued customers, and we apologize to everyone who was affected. 

Issue Summary

From 11:57 AM to 1:50 PM ET, customers experienced a decrease in performance using the etouches platform and occasional outages due to the performance decrease.  Since the impact of performance was dependent on server load based on the use of an application feature designed to monitor and enforce configurable event maximum registration limits, the performance impact and outages appeared random in nature to our end users.

Affected Customers

All customers in our North America region would have been able to perceive the decrease in system performance and/or inaccessibility.  Events that were configured to use the Maximum Event Registration functionality were particularly susceptible to the performance impact and are likely to have experienced a resulting outage.

Timeline (all times Eastern Time)
  • Sept 25 @ 11:57 AM: Reduced performance reported to etouches Support
  • Sept 25 @ 12:03 PM: Development picks up ticket and begins diagnosing issue
  • Sept 25 @ 1:00 PM: Performance improves due to reduce usage of offending code
  • Sept 25 @ 1:50 PM: Fix implemented, tested and deployed to Production environment
Root Cause

On September 23rd, a Hot Fix for an unrelated issue was deployed to Production that affected the performance of the Max Event Registrations logic.  The fix caused the interrogation of all Registration Attendee information in conjunction with the Max Event Registrations functionality thus causing performance impacts due to server load and increasing database queues.  

Resolution and recovery

Once the issue was escalated and identified by the Development team, the code that caused the Registration Attendee information to be interrogated was removed while still maintaining the functionality addressed by the offending Hot Fix as well as the Max Event Registrations feature.  Both pieces of functionality were tested and verified before deploying to the Production platform.

Corrective and Preventative Measures

In the last two days, we’ve conducted an internal review and analysis of the outage. The following are actions we are taking to address the underlying causes of the issue and to help prevent recurrence and improve response times:

  • Our current policy is to immediate deploy all hot fixes to all server simultaneously due to the urgent nature of these fixes. We are currently evaluating utilizing a staggered release approach for high risk/impact hot fixes similar to our normal production deployments in order to minimize exposure while monitoring the system post-deploy for unintended consequences.
  • We are also evaluating several methods of improving our approach to assessing the performance impacts of defect resolution tickets.
Posted about 2 years ago. Sep 28, 2017 - 01:22 EDT

Resolved
This incident has been resolved.
Posted about 2 years ago. Sep 25, 2017 - 13:51 EDT
Monitoring
We have restored service and are continuing to monitor the situation.
Posted about 2 years ago. Sep 25, 2017 - 12:36 EDT
Investigating
We are receiving reports from some clients in the US instance that they are having troubles logging in, we are currently investigating. please refer to this incident for updates.
Posted about 2 years ago. Sep 25, 2017 - 12:21 EDT
This incident affected: Web and Public API.