BGP convergence — the time it takes for the global routing table to stabilize after a topology change — is one of the most critical metrics in network engineering. In the default configuration, BGP can take anywhere from 30 seconds to several minutes to converge. For a DDoS mitigation platform, that's unacceptable.
At Wirescope, our backbone converges in under 800 milliseconds. Here's how.
Why convergence speed matters
When a PoP goes offline or a backbone link fails, traffic needs to be rerouted to an alternate path. During convergence, packets destined for the affected PoP are either dropped or delivered to a node that can't process them.
For our customers, this means potential service disruption. TCP sessions will typically timeout after 1-3 seconds of packet loss, depending on the application. If we can converge faster than that, the failure becomes invisible.
The traditional approach
Standard BGP relies on hold timers and keepalive messages to detect peer failures. The default hold timer is 90 seconds, meaning it can take up to 90 seconds to detect that a peer is down. Even with aggressive tuning (hold timer of 3 seconds), detection alone takes seconds.
After detection, the BGP decision process must recalculate the best path for all affected prefixes, update the RIB and FIB, and propagate the new routes to neighboring peers. Each of these steps adds latency.
Our approach
We use a combination of techniques to achieve sub-800ms convergence:
BFD (Bidirectional Forwarding Detection): We run BFD on all backbone links with a detection interval of 50ms and a multiplier of 3. This means we detect link failures in 150ms — 20x faster than aggressive BGP keepalives.
Prefix-independent convergence (PIC): Our routers pre-compute backup paths for all prefixes. When BFD signals a failure, the FIB is updated in a single atomic operation, without waiting for BGP reconvergence.
BGP Add-Path: We propagate multiple paths for each prefix through iBGP, so backup paths are already installed in the FIB before they're needed.
Custom convergence orchestrator: We built a control plane daemon that coordinates convergence across the backbone. When a failure is detected, it simultaneously triggers FIB updates on all affected nodes, rather than waiting for BGP to propagate hop-by-hop.
Results
In our testing (and in production), we consistently achieve end-to-end convergence in under 800ms:
- Link failure detection: ~150ms (BFD)
- FIB update: ~50ms (PIC with pre-computed backup)
- Traffic shift: ~200ms (propagation across backbone)
- Total: ~400-800ms depending on topology
This is fast enough that TCP sessions don't timeout, TLS handshakes don't need to be re-established, and most applications experience the failure as a brief increase in latency rather than an outage.
The bottom line
Sub-second convergence isn't just a nice-to-have. For a DDoS mitigation platform, it's the difference between "automatic failover" being a marketing claim and being a technical reality.