On May 22nd, 2020, starting at around 10:05 EST, Fonolo began to experience network routing issues in our primary Toronto data center, which resulted in some internet routes being in-accessible.
Once we were able to isolate the affected routes, we quickly reached out to our carrier partners about the issue, and learned that it was a known issue on their side, and that they had already dispatched a specialist. By around 10:30 EST, our carrier had determined that the issue was related to a catastrophic line card failure in their core routing infrastructure, resulting in traffic black-holing.
The line card failed in an abnormally non-graceful way, and therefore required manual intervention to disable the failed hardware, and route around the failure. A normal, graceful failure would have seen traffic re-route automatically.
By around 10:45 EST, our carrier had removed the failed hardware from service, routes were restored, and all Fonolo services were functioning normally. We continued to monitor the service for several hours, and marked the network issue as confirmed resolved by 13:49 EST.