On 6/6 from 10:31 UTC to 10:43 UTC, MDW and NYC ingest were down, and users were unable to log in to the Livepeer.com dashboard. During this time, users whose nearest datacenter was NYC or MDW, or who had chosen to stream into NYC or MDW, would have needed to manually reroute their streams to an alternative datacenter. Additionally, users were unable to access the Livepeer.com dashboard at this time.
We’ve been experiencing similar outages recently related to this issue and are currently investigating. No root cause has been identified yet and we are working quickly to fix the issue. We will update this postmortem once we have identified the root cause and provide transparency around what additional steps we’ll take to ensure this service disruption does not happen in the future.
UPDATE: A root cause related to Postgres replication system setting has been identified and changed. Additionally, we’ve identified some systemic changes to prevent this issue from happening in the future, including adding improved monitoring and alerting so we’re able to identify these issues faster.