Monitoring
MinIO KMS provides API endpoints for monitoring deployment health, status, and performance, as well as Prometheus-compatible metrics for integration with monitoring systems.
Health endpoints
| Endpoint | Description | Use case |
|---|---|---|
/version |
Returns the version of the MinIO KMS node | Version verification |
/v1/health/live |
Returns 200 OK for liveness checks |
Kubernetes liveness probe, load balancer health |
/v1/health/ready |
Returns 200 OK for operational readiness checks |
Kubernetes readiness probe |
/v1/health/metrics |
Returns Prometheus-compatible metrics | Prometheus scraping |
Prometheus metrics
The /v1/health/metrics endpoint exposes metrics in OpenMetrics v1.0 format, compatible with Prometheus text format.
Scrape configuration
Add the following job to your Prometheus configuration:
scrape_configs:
- job_name: minkms
scheme: https
tls_config:
ca_file: /path/to/ca.crt
cert_file: /path/to/client.crt
key_file: /path/to/client.key
static_configs:
- targets:
- MINKMS_HOST_1:7373
- MINKMS_HOST_2:7373
- MINKMS_HOST_3:7373
Replace the targets with your MinIO KMS node addresses.
Available metrics
Client API:
| Metric | Type | Description |
|---|---|---|
http |
counter | Total number of client API requests |
http_request |
counter | Request bytes sent and received |
http_request_inflight |
gauge | Client API requests currently being processed |
http_request_duration |
histogram | Time to process client API requests |
http_request_size |
histogram | Request body sizes |
http_response_size |
histogram | Response body sizes |
http_request_canceled |
counter | Client API requests canceled by the client |
http_request_timeout |
counter | Client API requests that timed out |
Internode RPC:
| Metric | Type | Description |
|---|---|---|
rpc_http |
counter | Total number of internode API requests |
rpc_http_request |
counter | Internode request bytes sent and received |
rpc_http_request_duration |
histogram | Time to process internode API requests |
rpc_http_request_size |
histogram | Internode request body sizes |
rpc_http_response_size |
histogram | Internode response body sizes |
rpc_http_request_canceled |
counter | Internode API requests canceled |
rpc_http_request_timeout |
counter | Internode API requests that timed out |
Network connections:
| Metric | Type | Description |
|---|---|---|
net_conn |
counter | Total established network connections |
net_conn_bytes |
counter | Bytes sent and received over the network |
net_conn_duration |
histogram | Time network connections remain open |
net_conn_open |
gauge | Currently open network connections |
Consensus:
| Metric | Type | Description |
|---|---|---|
consens_heartbeats |
counter | Heartbeats this node performed |
consens_elections |
counter | Elections this node started |
consens_leader_stepdown |
counter | Times this node stepped down from leader role |
Runtime:
| Metric | Type | Description |
|---|---|---|
runtime |
info | Version and build information |
runtime_cpu_time |
gauge | Total CPU time in seconds |
runtime_gc_time |
gauge | Total GC time in seconds |
runtime_gc_cycles |
gauge | Total GC cycles |
runtime_mem |
gauge | Total memory in bytes |
runtime_mem_heap |
gauge | Heap memory in bytes |
runtime_goroutines |
gauge | Number of active goroutines |
runtime_threads |
gauge | Number of OS threads |
Key metrics to monitor
| What to watch | Metric | Alert condition |
|---|---|---|
| API errors | http_request_timeout or http_request_canceled increasing |
Clients unable to reach KMS |
| Internode health | rpc_http_request_timeout increasing |
Cluster communication issues |
| Leader stability | consens_elections increasing rapidly |
Frequent leader elections indicate instability |
| Resource pressure | runtime_goroutines growing unbounded |
Possible goroutine leak |
| Memory | runtime_mem_heap approaching system limits |
Memory pressure |