Intoduce tokio/async-based BFD implementation by jgallagher · Pull Request #795 · oxidecomputer/maghemite

jgallagher · 2026-06-17T22:02:37Z

This PR adds a tokio/async-based implementation of BFD. It's heavily based on the existing implementation both in API shape and names, but has some intentional architectural differences (primarily to support pretty extensive unit testing). The non-test code is handwritten, but I leaned pretty heavily on Claude to help with writing tests (although I went over all of them by hand - it has a tendency to write flaky-in-a-not-obvious-way tests when sockets are involved, so hopefully I caught all of those) and used it to compare this implementation against the existing one. The primary motivator for moving to async is fixing #787, which it does: running the reproducer against this implementation results in the DELETEs returning more or less immediately (under 1ms).

The biggest architectural changes are:

The state machine is written in a pure "sans I/O" style, and must be driven by an external source that provides both input/output and a clock source. This allows straightforward unit testing without having to deal with timeouts, threads, or sockets.
Each BFD session starts 3 tokio tasks (one to drive the state machine, one to send outgoing packets, and one to sync with the RIB), plus a tokio task for the local listening address that's shared with all BFD sessions using that same listening address. This is pretty close to swapping out threads for tasks; the sync implementation started 3 OS threads per session (two to drive the state machine, one to send outgoing packets) plus a local listening thread shared between sessions. It didn't need a separate thread for RIB syncing, but we do that here to bridge between the sync and async worlds.
Communication between the UDP send/recv tasks and the state machine driver use bounded channels and drop packets if the channels are full. The old implementation used unbounded channels. I don't expect this to matter in practice, but it's a spot where something going off the rails (e.g., a send_to() getting stuck or something) used to mean we'd have unbounded memory growth now means we have a lot of dropped packets instead.

Some less-architectural-but-still-intentional changes:

I made the detection threshold a NonZeroU8, both because I was concerned about multiplying by 0 and because the RFC says such packets MUST be rejected. We now do so, and also return a 400 if we're asked to configure a session with a threshold of 0. This might be the wrong call and I'm happy to discuss - there was a "panic on startup if we can't re-add all peers" path, so I made that path bump any persisted 0s to 1s, but maybe I should do the same in the API handler instead of returning a 400, in case there are Nexus configs persisted that have a threshold of 0?
Some more careful math in the event of unreasonable large numbers (e.g., replacing an as u32 that wrapped around on large values with .saturating_*() methods).
We now increment SessionCounters::control_packets_received on incoming packets - I think this was just an oversight before?
If we fail to bind an egress socket, the old code would retry every 5 seconds; this code retries every time it has a packet it wants to send. I'm not sure if this is better or worse but it was simpler to implement - happy to discuss.

Some things that I left unchanged but I'm not sure are correct:

Calculations of the "next send" and "recv deadline" timers in the state machine - I left TODO-correctness comments on these with my notes but didn't want to change that behavior as part of a big rework.
Outgoing packets don't populate the local detection_multiplier value (also tagged with a TODO-correctness in the state machine).
We close the egress socket and rebind on any error from UdpSocket::send_to(). Could that leave us in a constant loop of closing/opening sockets if sends are failing for some reason not related to the local socket?
When removing the peer, there's no final sync with the RIB (nor is there in the sync implementation) - I think this means if we remove a peer in the Up state, we'll leave behind a nexthop_shudown of false for that address? I don't know whether this is okay or gets cleaned up by something else.
Should required_min_rx also be a NonZero type? If that gets set to 0 we'll also have a "multiply by 0" in the calculation of the recv deadline.

This PR is only the async implementation with no modifications to the main mgd binary except the change of detection threshold to NonZeroU8 mentioned above. I'll open a second PR that swaps the implementations over.

jgallagher added 23 commits June 17, 2026 14:03

initial sans-io BFD state machine

9897971

initial tests for sans-io bfd sm

6dbe6c1

edge case tests

df2c582

enforce nonzero detection multiplier

af31b44

expand correctness comments

7bbec21

move to new crate

96c342d

doc comments

0082e21

add SingleHopEgressSrcPort

3d1e76b

add initial Dispatcher impl

9e9b88a

change panic to debug_assert

5ab980f

tests for Dispatcher

2352faf

async daemon / session impls

6469ece

rename crate to bfd-async

1b9eacd

restore packet and transition logging

96dc6a9

be more conservative about overflow

b5ab3f2

port over socket TTL tests

516b093

increment unexpected message / message error counters

e3723c2

add end to end test

ba1bd90

nicer egress src port selection

f0692ab

minor TODO comment

873c38b

use egress src port iter for multihop too

083395a

don't panic on startup if we persisted a threshold of 0

fe91a32

cargo fmt

12df405

jgallagher requested review from rcgoodfellow and taspelund June 17, 2026 22:02

jgallagher mentioned this pull request Jun 17, 2026

mgd: swap sync bfd impl out for bfd-async #796

Open

nicolaskagami self-requested a review June 18, 2026 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intoduce tokio/async-based BFD implementation#795

Intoduce tokio/async-based BFD implementation#795
jgallagher wants to merge 23 commits into
mainfrom
john/bfd-async

jgallagher commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jgallagher commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant