MDW and NYC Ingest Down
Incident Report for Livepeer Studio
Postmortem

What Happened

On 6/6 from 10:31 UTC to 10:43 UTC, MDW and NYC ingest were down, and users were unable to log in to the Livepeer.com dashboard. During this time, users whose nearest datacenter was NYC or MDW, or who had chosen to stream into NYC or MDW, would have needed to manually reroute their streams to an alternative datacenter. Additionally, users were unable to access the Livepeer.com dashboard at this time.

What We Are Doing About This

We’ve been experiencing similar outages recently related to this issue and are currently investigating. No root cause has been identified yet and we are working quickly to fix the issue. We will update this postmortem once we have identified the root cause and provide transparency around what additional steps we’ll take to ensure this service disruption does not happen in the future.

UPDATE: A root cause related to Postgres replication system setting has been identified and changed. Additionally, we’ve identified some systemic changes to prevent this issue from happening in the future, including adding improved monitoring and alerting so we’re able to identify these issues faster.

Posted Jun 08, 2021 - 17:12 UTC

Resolved
We had received and confirmed reports that ingest and playback in NYC and MDW were down, along with issues with livepeer.com dashboard login page. It was shortly fixed by restarting the stuck API servers in the MDW datacenter.

This incident lasted from ~10:30 UTC to ~10:45 UTC.
Posted Jun 06, 2021 - 10:30 UTC