Monitoring

MinIO KMS provides API endpoints for monitoring deployment health, status, and performance, as well as Prometheus-compatible metrics for integration with monitoring systems.

Health endpoints

Endpoint Description Use case
/version Returns the version of the MinIO KMS node Version verification
/v1/health/live Returns 200 OK for liveness checks Kubernetes liveness probe, load balancer health
/v1/health/ready Returns 200 OK for operational readiness checks Kubernetes readiness probe
/v1/health/metrics Returns Prometheus-compatible metrics Prometheus scraping

Prometheus metrics

The /v1/health/metrics endpoint exposes metrics in OpenMetrics v1.0 format, compatible with Prometheus text format.

Scrape configuration

Add the following job to your Prometheus configuration:

scrape_configs:
  - job_name: minkms
    scheme: https
    tls_config:
      ca_file: /path/to/ca.crt
      cert_file: /path/to/client.crt
      key_file: /path/to/client.key
    static_configs:
      - targets:
          - MINKMS_HOST_1:7373
          - MINKMS_HOST_2:7373
          - MINKMS_HOST_3:7373

Replace the targets with your MinIO KMS node addresses.

Available metrics

Client API:

Metric Type Description
http counter Total number of client API requests
http_request counter Request bytes sent and received
http_request_inflight gauge Client API requests currently being processed
http_request_duration histogram Time to process client API requests
http_request_size histogram Request body sizes
http_response_size histogram Response body sizes
http_request_canceled counter Client API requests canceled by the client
http_request_timeout counter Client API requests that timed out

Internode RPC:

Metric Type Description
rpc_http counter Total number of internode API requests
rpc_http_request counter Internode request bytes sent and received
rpc_http_request_duration histogram Time to process internode API requests
rpc_http_request_size histogram Internode request body sizes
rpc_http_response_size histogram Internode response body sizes
rpc_http_request_canceled counter Internode API requests canceled
rpc_http_request_timeout counter Internode API requests that timed out

Network connections:

Metric Type Description
net_conn counter Total established network connections
net_conn_bytes counter Bytes sent and received over the network
net_conn_duration histogram Time network connections remain open
net_conn_open gauge Currently open network connections

Consensus:

Metric Type Description
consens_heartbeats counter Heartbeats this node performed
consens_elections counter Elections this node started
consens_leader_stepdown counter Times this node stepped down from leader role

Runtime:

Metric Type Description
runtime info Version and build information
runtime_cpu_time gauge Total CPU time in seconds
runtime_gc_time gauge Total GC time in seconds
runtime_gc_cycles gauge Total GC cycles
runtime_mem gauge Total memory in bytes
runtime_mem_heap gauge Heap memory in bytes
runtime_goroutines gauge Number of active goroutines
runtime_threads gauge Number of OS threads

Key metrics to monitor

What to watch Metric Alert condition
API errors http_request_timeout or http_request_canceled increasing Clients unable to reach KMS
Internode health rpc_http_request_timeout increasing Cluster communication issues
Leader stability consens_elections increasing rapidly Frequent leader elections indicate instability
Resource pressure runtime_goroutines growing unbounded Possible goroutine leak
Memory runtime_mem_heap approaching system limits Memory pressure