Transport & Auth
How MemKV moves bytes — RDMA DC, RC fallback, the TCP wire format, HMAC-SHA256 authentication, and the context-block offload flow.
MemKV speaks the same authenticated wire format over two transports. The Linux full server always exposes both; clients pick per-request.
RDMA (DC + RC)
Data operations use DC (Dynamically Connected) transport on Mellanox mlx5 NICs with DC support. DC removes
the per-peer QP setup cost by addressing any DCT (DC Target) endpoint from a shared pool of DCI (DC Initiator) QPs:
the number of QPs scales O(N) with the cluster instead of O(N²) in classic RC.
On hardware without DC support, the system falls back to RC (Reliable Connection) QPs transparently. The DCI
pool is fixed-size by default (see rdma.num_dcis in Configuration); under heavy fan-in
the same DCI can be reacquired round-robin by different workers.
Wire carriers
The server listens on TCP at the configured network.address
(default 9900). The RDMA bootstrap (Connect=0x06) rides that
TCP connection because the RC QP doesn't exist yet at that point.
Once Connect transitions the RC QP to RTS, the steady-state
control messages (Allocate, Lookup, Commit, Delete,
Read, Write, BatchRead, BatchWrite, Exists) ride RC SEND/RECV on
the per-connection QP. Bulk payloads ride RDMA WRITE / RDMA READ
on the DCI/DCT pool (or RC fallback when DC isn't available).
Hosts without an RDMA NIC keep the TCP connection for everything:
no QP is ever bootstrapped, and the inline-bulk codes (TcpPut=0x20,
TcpGet=0x21, TcpDelete=0x22) plus Exists carry block payloads
and queries inline in the signed length-prefixed frame.
The client picks transport via MEMKV_TRANSPORT (or transport: in
MEMKV_CONFIG):
auto— try RDMA first; on the first RDMA failure (boot-time, or in-flight) latch into TCP-only mode for the life of the engine. Logged at WARN. This is the Linux default.rdma— strict. The client errors at startup ifrdma_devicesis empty or the build is non-Linux; in-flight RDMA failures propagate. Use in production where RDMA must be the actual data path.tcp— skip RDMA entirely. The configured server address is the TCP target directly. This is the macOS default.
Transport choice is a client-side decision; the server has no knob to disable either listener.
Topology vs storage mode
Two orthogonal choices:
- Topology — distributed (multi-server, NVMe per server; the shape behind the 96.7 GiB/s numbers in Benchmarks) or co-located (one MemKV server on the inference host, common on GPU boxes with NVMe next to the NIC).
- Storage mode —
directis the performance path on raw block devices, in either topology.fileis mmap-backed regular files for developers, kind / CI, Macs, and hosts without raw drives to dedicate; bandwidth is bounded by the host filesystem.
macOS is always file-mode + TCP because RDMA / JBOF / hugepages
aren't compiled in there. The same shape is reachable on Linux via
storage.mode: file + transport: tcp, but a Linux host with raw
drives should run jbof for real performance.
Authentication
Every signed wire message carries a 40-byte trailer: an 8-byte
timestamp followed by a 32-byte HMAC-SHA256 over
(header || ts_ns). The shared key is configured at both
ends (network.auth_key server-side, auth_key: client-side, or
MEMKV_AUTH_KEY env on either). Drift window is ±60 s; messages
outside the window or with a bad MAC are silently dropped. There
is no unauthenticated mode — the engine refuses to construct
without a key.
Context Block Offload Flow
- Dynamo/KVBM identifies context blocks for offload.
- NIXL calls the MemKV plugin with client buffer address and rkey.
- Plugin sends write request to server via RDMA messaging (RC QP).
- Server acquires a DCI from the pool and does RDMA READ from client memory (DC transport).
- Server persists to NVMe via io_uring with O_DIRECT.
- On a lookup hit, server reads from NVMe and RDMA WRITEs back to client via DCI.