Investigating Increased Error Rate

Resolved

This incident has been resolved.

Wed, Apr 3, 2019, 09:08 PM

(6 years ago)

Affected components

No components marked as affected

Updates

Resolved

This incident has been resolved.

Wed, Apr 3, 2019, 09:08 PM

Monitoring

Our Postgres master died and appeared to have lost its disk upon restarting. The caching layers we have in front of the database all continued to operate, allowing regular service to continue. Logins require validation with the database, however, which prevented new authentication tokens being given for dashboard logins. A spike of 500s was observed just after the new database came online, which we shall be investigating. No further API issues have been observed for the last 10 minutes but we are continuing to monitor.

Wed, Apr 3, 2019, 03:25 PM(5 hours earlier)

Investigating

We are currently investigating errors connecting to Postgres which is affecting dashboard login sessions and a small number of API requests.

Wed, Apr 3, 2019, 02:43 PM(42 minutes earlier)

Investigating

We are currently investigating this issue.

Wed, Apr 3, 2019, 02:40 PM