SuprSend - Few workflows & broadcasts are failing – Incident details

Few workflows & broadcasts are failing

Resolved
Major outage
Started 1 day agoLasted about 4 hours

Affected

Notification Engine

Degraded performance from 7:03 AM to 9:04 AM, Operational from 9:04 AM to 9:46 AM, Degraded performance from 9:46 AM to 10:15 AM, Major outage from 10:15 AM to 10:37 AM, Operational from 10:37 AM to 11:15 AM

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Update
    Update
    We are currently investigating this incident.
  • Investigating
    Investigating
    We are currently investigating this incident.
  • Resolved
    Resolved

    This incident has been resolved. Some data loss has happened for older long running workflows, but overall system is stable now for new workflows. We are in touch with AWS to understand how this could have happened and if there can be any way to get lost data from Serverless ElasticCache backups.

  • Monitoring
    Monitoring

    We implemented a fix and are currently monitoring the result.

  • Identified
    Identified

    Issue is stemming from cache issues in our downstream AWS ElastiCache, We are continuing to work on a fix for this incident, we have involved AWS support team as well.

  • Investigating
    Investigating
    We are currently investigating this incident.