Quick Start
Install MemKV, initialize NVMe drives, and start the server.
Deployment shapes
MemKV ships in two server shapes — same wire protocol, auth, and license model:
- Distributed full server (Linux) — multi-node, RDMA + NVMe + JBOF.
The performance path: zero-copy RDMA over DC, NVMe via
io_uring + O_DIRECT, hugepage-backed bounce buffers. The shape benchmarked at 96.7 GiB/s. - Single-node co-located server — one MemKV server on the same host
as the inference client (SGLang, vLLM, Dynamo, a Python runtime, …).
File-mode storage and TCP transport only; the engine talks to MemKV
over
127.0.0.1with no RDMA in the path. Targets: a Linux GPU dev box, an edge AI node, or an Apple Silicon Mac. On Linux, setstorage.mode: file; on macOS this shape is the default because RDMA / JBOF / hugepages aren't available there.
Requirements — distributed full server
- Linux kernel 6.8+
- RDMA NIC — Mellanox mlx5 with DC support for the fast path; any RoCE NIC for RC fallback
- NVMe drives — exclusively owned by MemKV, re-initialized on
memkv setup - Hugepages reserved for bounce buffers — at least
config.memory.maxSizeworth of 2 MiB pages (sudo sysctl -w vm.nr_hugepages=N).memkv startrefuses to launch if free hugepage memory is below the configured pool. - A MemKV license (Free or Enterprise) — see step "Provision a license" below.
Requirements — single-node co-located server
- Linux 6.8+ or macOS 13+ (Apple Silicon or Intel)
- Regular disk for file-mode storage; no RDMA, no JBOF, no hugepages required
- TCP transport only — bandwidth bounded by host filesystem and NIC
For performance, use raw block devices via mode: direct — the path behind
the published 96.7 GiB/s numbers, in both distributed and co-located shapes.
File mode is for a different audience: developers, home users, and anyone on a
host without raw drives to dedicate.
Bare-metal install. One binary that contains the server, the admin client, and the bundled docs site.
Download the binary
curl -LO https://dl.min.io/memkv/release/linux-amd64/memkv
chmod +x memkv
sudo mv memkv /usr/local/bin/
memkv --versionlinux-arm64 artefacts are also published; swap the URL accordingly.
Install the NIXL plugin
Download the prebuilt plugin and drop it into the NIXL plugin directory:
sudo mkdir -p /opt/nvidia/nvda_nixl/lib/plugins
sudo curl -L \
https://dl.minio.io/aistor/memkv/release/linux-amd64/libplugin_MEMKV.so \
-o /opt/nvidia/nvda_nixl/lib/plugins/libplugin_MEMKV.soOverride the install location with NIXL_PLUGIN_DIR=/custom/path.
Supported NIXL memory types: DRAM_SEG, VRAM_SEG (via
nvidia-peermem).
If you installed MemKV via the .deb or .rpm package instead of
the raw binary, libplugin_MEMKV.so is already at
/usr/lib/memkv/libplugin_MEMKV.so — symlink or copy it into the
NIXL plugin directory.
Provision a license
Request a license (Free or Enterprise) at
min.io/pricing and save the JWT under
/etc/memkv/:
sudo install -m 0600 /path/to/minio.license /etc/memkv/minio.licenseThe server picks it up automatically. Override path with
MEMKV_LICENSE=/some/other/path if needed.
Free tier on the distributed shape is limited to a single
server in servers. Multi-server scale-out requires Enterprise.
Configure the client
The plugin reads memkv-client settings from a yaml file pointed
at by MEMKV_CONFIG. Create a minimal one:
sudo mkdir -p /etc/memkv
sudo tee /etc/memkv/client.yaml > /dev/null <<EOF
servers:
- server-0.memkv.example.com:9900
- server-1.memkv.example.com:9900
rdma_devices: [mlx5_0, mlx5_1]
transport: auto # auto picks RDMA when rdma_devices is set, TCP otherwise
auth_key: REPLACE_WITH_SERVERS_AUTH_KEY
license: /etc/memkv/minio.license
EOF
export MEMKV_CONFIG=/etc/memkv/client.yamlThe auth_key must match what memkv setup generates on the
servers — read it back from /etc/memkv/config.yaml (the
network.auth_key field) and paste it here, or set the matching
MEMKV_AUTH_KEY env var on both sides. Anything in the yaml can
be overridden at runtime via the corresponding MEMKV_* env var
(MEMKV_SERVERS, MEMKV_RDMA_DEVICES, MEMKV_LICENSE,
MEMKV_AUTH_KEY, …). Full schema and precedence:
Configuration → Client.
Initialize drives
memkv setup writes the generated YAML to stdout — redirect it to
your config path:
sudo memkv setup \
--drives /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 \
--rdma mlx5_0 \
| sudo tee /etc/memkv/config.yamlSee the CLI reference for the full flag list.
memkv setup reformats the listed drives. All existing data is
destroyed. Use --force to overwrite a previously-initialized
drive.
Start the server
sudo memkv start --config /etc/memkv/config.yamlVerify the server is healthy:
curl http://localhost:9901/v1/health
# {"status":"healthy","rdma_active":true,"drives_online":3,"drives_offline":0,"drives_total":3}One MemKV server on the inference host. File-mode storage + TCP only, no RDMA or JBOF or hugepages. Same binary on Linux and macOS; macOS has no RDMA / NVMe stack compiled in, so this shape is the only option there.
Generate an auth key and write the server config
sudo mkdir -p /etc/memkv /var/lib/memkv
openssl rand -hex 32 | sudo tee /etc/memkv/auth_key > /dev/null
sudo chmod 600 /etc/memkv/auth_key
AUTH=$(sudo cat /etc/memkv/auth_key)
sudo tee /etc/memkv/config.yaml > /dev/null <<EOF
network:
address: 127.0.0.1:9900 # data plane; admin auto-binds at
# port + 1 (9901)
auth_key: "$AUTH"
memory:
block_size: 2 MiB
max_size: 1 GiB
storage:
mode: file
block_size: 4 MiB
drives:
- media: /var/lib/memkv/drive0.dat
max_size: 16 GiB
EOFThe shared secret is persisted to /etc/memkv/auth_key so step 3
can read it back even from a fresh shell.
Free-tier license is allowed on this shape. Single entry in
storage.drives, ≤ 32 TiB total capacity. Multi-drive scale-out
requires Enterprise. Get a Free-tier license — or talk to MinIO
about Enterprise — at min.io/pricing.
Start the server
Save the license JWT to /etc/memkv/minio.license (the server
auto-discovers it there) and start:
sudo install -m 0600 /path/to/minio.license /etc/memkv/minio.license
sudo memkv start --config /etc/memkv/config.yamlThe backing file is fully preallocated on first start
(fallocate on Linux, fcntl(F_PREALLOCATE) on macOS) — first run
is slower but there are no sparse-file surprises later.
Verify health (on Linux the response includes drive/RDMA fields; macOS returns the short form):
curl http://127.0.0.1:9901/v1/health
# macOS: {"status":"healthy"}
# Linux: {"status":"healthy","rdma_active":false,"drives_online":1,"drives_offline":0,"drives_total":1}Point the inference client at it
Both the NIXL plugin and the LD_PRELOAD shim read settings from
MEMKV_CONFIG. Force transport: tcp so the client doesn't try
RDMA at all (macOS defaults to tcp automatically; Linux defaults
to auto and would attempt RDMA first):
AUTH=$(sudo cat /etc/memkv/auth_key)
cat > "$HOME/memkv-client.yaml" <<EOF
servers:
- 127.0.0.1:9900
transport: tcp
auth_key: "$AUTH"
license: /etc/memkv/minio.license
EOF
export MEMKV_CONFIG="$HOME/memkv-client.yaml"Now any process that loads the NIXL plugin or the LD_PRELOAD shim routes context-cache traffic to the local MemKV server over TCP.
Helm chart at deploy/helm/memkv; image at quay.io/minio/memkv.
The chart deploys a StatefulSet — one Pod per server with hard
pod-anti-affinity — mirroring MemKV's shared-nothing architecture.
Cluster prerequisites: Kubernetes 1.27+, RDMA-capable NICs on each storage
node, hugepages reserved per node, and a license. The chart enforces these at
install time via values.schema.json.
Pre-create raw block PVs for the NVMe drives
MemKV writes its JBOF superblock and extents directly to raw block
devices. The chart's default drives.mode: blockDevice claims one
PVC per drive per replica, bound to operator-supplied local PVs.
The bound PV's nodeAffinity pins the Pod to the right host
automatically.
Edit
deploy/helm/memkv/examples/local-pv-example.yaml
to match your nodes and drives, then apply:
kubectl apply -f deploy/helm/memkv/examples/local-pv-example.yamlCreate the license Secret
Request a license (Free or Enterprise) at
min.io/pricing, save the JWT to
minio.license, then load it into the cluster:
kubectl create namespace memkv
kubectl -n memkv create secret generic memkv-license \
--from-file=license=/path/to/minio.licenseInstall the chart
Recommended path. Each drive is a raw block PVC bound to a
local PV; the scheduler pins each Pod to the host that
owns its drives.
helm install memkv deploy/helm/memkv \
--namespace memkv \
--set replicaCount=2 \
--set drives.blockDevice.count=12 \
--set drives.blockDevice.storageClass=memkv-local \
--set license.existingSecret=memkv-license \
--set config.memory.maxSize="64 GiB" \
--set hugepages.amount=64GiConvenience mode — bind-mounts the host /dev into the Pod
and points MemKV at /dev/nvmeXn1 paths directly. Matches the
docker-compose deployment.
The chart cannot pin Pods to a specific node in this mode.
Always supply nodeSelector or affinity so MemKV does not
land on a node that lacks the configured drives.
helm install memkv deploy/helm/memkv \
--namespace memkv \
--set drives.mode=hostDev \
--set 'drives.hostDev.devices={/dev/nvme0n1,/dev/nvme1n1}' \
--set 'nodeSelector.memkv\.minio\.io/storage=true' \
--set license.existingSecret=memkv-licenseFile-mode profile: mmap-backed regular files inside a PVC. No
losetup, no NVMe, no privileged init container — runs on any
cluster, including kind on a Mac/arm64 box and managed
Kubernetes without raw-block PV support. Bandwidth is bounded
by the host filesystem; intended for dev / CI and small
single-node co-located deployments, not production.
helm install memkv deploy/helm/memkv \
--namespace memkv --create-namespace \
-f deploy/helm/memkv/ci/ci-values.yamlThe full SoftRoCE-driven /v1/health recipe (load rdma_rxe,
bind-mount /dev/infiniband into the kind node) is in
deploy/helm/README.md.
Verify the deployment
kubectl -n memkv get pods -o wide
kubectl -n memkv get pvc -l app.kubernetes.io/instance=memkvEach Pod should land on a different node, and every PVC should be
Bound. Check the admin endpoint:
kubectl -n memkv port-forward svc/memkv-admin 9901:9901
curl http://127.0.0.1:9901/v1/health
# {"status":"healthy","rdma_active":true,"drives_online":12,"drives_offline":0,"drives_total":12}What is MemKV?
High-performance distributed inference context memory store — bridging GPU HBM and NVMe for long-context LLM inference.
Dynamo + MemKV
Run NVIDIA Dynamo with MemKV as the remote KV-cache tier behind KVBM. Drop the NIXL plugin in, set a handful of env vars, and let KVBM offload evicted blocks into a MemKV cluster over RDMA.