Service interruptions on the API and web dashboard

Incident Report for Cosmic

Postmortem

In regard to the downtime that occurred on September 14, 2022 at 21:29 UTC.

I apologize for any inconvenience this may have caused. We take uptime extremely seriously and know that events like this are unacceptable.

What happened
A burst of traffic to our API caused our database cluster to max out available memory. This caused an auto-scale event to occur, which then caused the database to become unresponsive, requiring a restart.

After the database came back online, and further investigation, we implemented more specific indexing on the data collections to optimize the inefficient queries.

What we will do to prevent this in the future
We have now implemented optimized indexes on all database queries which were showing inefficiencies.

We will continue to monitor any new query operations that require optimizations.

Posted Sep 16, 2022 - 17:42 UTC

Resolved

This incident has been resolved.

Posted Sep 14, 2022 - 22:45 UTC

Monitoring

Service has been restored and we will continue to monitor the issue.

Posted Sep 14, 2022 - 21:56 UTC

Investigating

We are currently investigating this issue.

Posted Sep 14, 2022 - 21:29 UTC

This incident affected: API and Web Dashboard.