Dynamo + MemKV

Run NVIDIA Dynamo with MemKV as the remote KV-cache tier behind KVBM. Drop the NIXL plugin in, set a handful of env vars, and let KVBM offload evicted blocks into a MemKV cluster over RDMA.

NVIDIA Dynamo runs vLLM behind its own DynamoConnector. The KV-block manager (KVBM) inside Dynamo offloads evicted blocks through NIXL to a configurable remote backend. MemKV ships a NIXL plugin (libplugin_MEMKV.so) that registers as that backend. Once the plugin is mounted in the Dynamo container and DYN_KVBM_NIXL_BACKEND=MEMKV is set, KVBM transparently routes offload traffic to a MemKV cluster over RDMA.

This is the production path behind MemKV's headline benchmark numbers.

What you need

A running MemKV cluster (one or more nodes).
A MemKV license file (minio.license).
The MemKV auth key (32-byte HMAC, hex-encoded).
A Dynamo runtime image with KVBM (the nvcr.io/nvidia/ai-dynamo/vllm-runtime image and tags built off the miniohq/memkv-rename lineage — see INTEGRATIONS.md in the repo for the fork lineage and patch series).
libplugin_MEMKV.so for your platform.
One or more RDMA NICs visible on the GPU host — MEMKV_RDMA_DEVICES=mlx5_0,mlx5_1 binds them.

Dynamo upstream has no published plugin/extension point for adding a remote backend yet, so MemKV currently ships against a forked Dynamo build that adds the MEMKV backend. The fork is captured in this repo's INTEGRATIONS.md; when Dynamo exposes a stable extension point, this page collapses into "drop the plugin in and set the env var."

Step 1: bring up MemKV

Use the standard MemKV deployment flow. Once the cluster is up, each node listens on TCP :9900 for the wire protocol and HTTP :9901 for admin (by default; data_port + 1).

Step 2: install the NIXL plugin

Download the prebuilt NIXL plugin to the GPU host:

sudo mkdir -p /opt/memkv-plugins
sudo curl -L \
  https://dl.min.io/aistor/memkv/release/linux-amd64/libplugin_MEMKV.so \
  -o /opt/memkv-plugins/libplugin_MEMKV.so

You will bind-mount this file into the Dynamo container at the path NIXL scans (/opt/nvidia/nvda_nixl/lib/plugins/libplugin_MEMKV.so) in Step 4.

Step 3: configure the MemKV connection

The plugin reads the standard MemKV config chain — MEMKV_* env vars are the simplest path:

export MEMKV_SERVERS="host-a:9900,host-b:9900"
export MEMKV_AUTH_KEY="<64-hex-char auth key>"
export MEMKV_TRANSPORT=auto              # auto | rdma | tcp
export MEMKV_RDMA_DEVICES="mlx5_0,mlx5_1"
export MEMKV_LICENSE=/path/to/minio.license
export MEMKV_STAGING_SIZE_MB=256
export MEMKV_STAGING_SLOT_MB=16
export MEMKV_NUM_CONNECTIONS=4

For the RDMA fast path inside Docker, the container needs --device=/dev/infiniband plus --cap-add=IPC_LOCK --ulimit memlock=-1.

Step 4: launch Dynamo with KVBM pointed at MemKV

The Dynamo container needs three things on top of an ordinary vLLM launch:

The NIXL plugin bind-mounted into NIXL's plugin directory.
DYN_KVBM_NIXL_BACKEND=MEMKV and DYN_KVBM_REMOTE_STORAGE_TYPE=memkv so KVBM selects the MEMKV path.
libucx0 installed in the image — the prebuilt NIXL stack the plugin links against pulls in UCX runtime that the upstream vllm-openai image does not ship.

docker run -d --name dynamo-memkv-vllm \
    --runtime=nvidia --net=host --shm-size=64g --ipc=host \
    --device=/dev/infiniband --cap-add=IPC_LOCK --ulimit memlock=-1 \
    -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
    -e DYN_KVBM_NIXL_BACKEND=MEMKV \
    -e DYN_KVBM_REMOTE_STORAGE_TYPE=memkv \
    -e DYN_KVBM_CPU_CACHE_GB=60 \
    -e MEMKV_SERVERS="host-a:9900,host-b:9900" \
    -e MEMKV_AUTH_KEY="$AUTH_KEY" \
    -e MEMKV_TRANSPORT=auto \
    -e MEMKV_RDMA_DEVICES="mlx5_0,mlx5_1" \
    -e MEMKV_NUM_CONNECTIONS=4 \
    -e MEMKV_STAGING_SIZE_MB=256 \
    -e MEMKV_STAGING_SLOT_MB=16 \
    -e MEMKV_LICENSE=/minio.license \
    -v /path/to/models:/inference-models:ro \
    -v /path/to/minio.license:/minio.license:ro \
    -v /opt/memkv-plugins/libplugin_MEMKV.so:/opt/nvidia/nvda_nixl/lib/plugins/libplugin_MEMKV.so:ro \
    nvcr.io/nvidia/ai-dynamo/vllm-runtime:<tag> bash -lc '
      apt-get update && apt-get install -y --no-install-recommends libucx0 &&
      dynamo serve /inference-models/<model-dir> \
        --host 0.0.0.0 --port 8810 \
        --tensor-parallel-size <N> \
        --enable-prefix-caching \
        --max-model-len <max_seq_len> \
        --gpu-memory-utilization 0.9
    '

The reference launcher used to produce MemKV's published Dynamo numbers is scripts/launch-dynamo-memkv.sh in this repo — copy it as a starting point.

Overriding the KVBM build

When tracking a custom Dynamo fork (the common case today), point the container at the rebuilt KVBM artifacts via two env vars consumed by the launcher:

KVBM_CORE_SO=/host/path/to/lib_core.so — bind-mounts the rebuilt KVBM cdylib over the image's default.
KVBM_PY_SRC=/host/path/to/kvbm-python-overlay — bind-mounts a Python overlay so the in-image wrapper stays in sync with the rebuilt cdylib.

Both are optional; default Dynamo images use the bundled KVBM.

What this integration buys

Capacity beyond per-replica HBM. KVBM evicts blocks from GPU HBM into MemKV instead of dropping them; the aggregate KV cache becomes HBM + CPU pool + MemKV cluster.
Cross-replica sharing. Multiple Dynamo workers pointed at the same MemKV cluster share one block pool, keyed by NIXL descriptor fields.
RDMA wire speed. With MEMKV_TRANSPORT=auto and the right NICs, block transfers ride DC RDMA at HCA line rate. The benchmark page shows 97.4 GiB/s on 2 servers, ~97% of 2× 400GbE.
Durability. On-drive shards survive Dynamo restarts.
HMAC auth. Every op is HMAC-authenticated with the cluster-wide shared key.

Operational notes

/v1/cache/status lives in Dynamo, not MemKV. The Dynamo patch series exposes a get_pool_status management surface; MemKV's /v1/status admin endpoint is unrelated. ttft-sweep drains and readiness checks talk to Dynamo's status endpoint, not MemKV's.
Per-rank, per-worker connections. With TP=N you will see N sets of QPs / TCP sessions per MemKV server.
License is mandatory. The MemKV plugin verifies the license during NIXL load; without one, Dynamo fails to bring the backend up.
NIXL plugin is OBJ_SEG-shaped. KVBM hands the plugin an RDMA descriptor list; the plugin posts RDMA WRITE/READ ops with dev_id as the routing key into the MemKV cluster.

Roadmap

Upstream extension point. When Dynamo exposes a stable plugin ABI, the captured patch series in INTEGRATIONS.md goes away and the integration becomes plugin-only.
Pre-registered descriptor pools to avoid per-call MR registration on the KVBM side.

References

INTEGRATIONS.md — fork lineage, patch series, and developer-side build steps.
NIXL — NVIDIA Inference Xfer Library
Dynamo — NVIDIA Inference Framework

Dynamo + MemKV

On this page