Elevated API Error Rate
Incident Report for Ravelin
Resolved
There have been no further bursts of errors this evening, but we will continue to monitor for this behaviour while and after we test our patch.
Posted Oct 06, 2020 - 23:57 BST
Monitoring
No burst of errors since those reported at 14:45-14:55 BST, but we are continuing to monitor.
Posted Oct 06, 2020 - 19:09 BST
Update
We haven't seen any further bursts of errors since 14:45-14:55 BST but we are continuing to monitor our errors rates and alerts.

In the mean time we have prepared an upgrade to our gRPC connection pooling - discontinuing use of the deprecated gRPC load balancer - which we will be testing and rolling out in the hopes of addressing these errors.
Posted Oct 06, 2020 - 16:23 BST
Investigating
We have observed several small bursts of 500s on api.ravelin.com and pci.ravelin.com today.

The errors amount to roughly 2% of requests system-wide each time, and are the result of reads and writes to Google BigTable timing out. The impact of timeouts is inconsistent between hosts, and we are keeping an eye out for this issue re-occurring. This incident is a continuation of this morning's incident, which we continue to investigate: https://status.ravelin.com/incidents/xx5sm41vgr1y
Posted Oct 06, 2020 - 15:57 BST
This incident affected: API and pci.ravelin.com.