Healthcheck Probes

Each AIStor server process exposes unauthenticated endpoints for probing server uptime and deployment high availability for simple healthchecks. These endpoints return an HTTP status code indicating whether the underlying resource is healthy or satisfies read/write quorum. The server exposes no other data through these endpoints.

AIStor liveness

Use the following endpoint to test if the specified AIStor server is up and ready to serve requests:

curl -I https://aistor.example.net:9000/minio/health/live

Replace https://aistor.example.net:9000 with the DNS hostname and port of the server to check.

A response code of 200 OK indicates the server is online and functional. Any other HTTP codes indicate an issue with reaching the server, such as a transient network issue or potential downtime.

The healthcheck probe alone cannot determine if the server is offline - only that the current host machine cannot reach the server. Consider configuring a Prometheus alert using the minio_cluster_servers_offline_total metric to detect whether one or more AIStor Server servers are offline.

Cluster write quorum

Use the following endpoint to test if an AIStor deployment has write quorum:

curl -I https://aistor.example.net:9000/minio/health/cluster

Replace https://aistor.example.net:9000 with the DNS hostname and port of any server in the deployment to check. For clusters using a load balancer to manage incoming connections, specify the hostname for the load balancer.

A response code of 200 OK indicates that the deployment has sufficient servers online to meet write quorum. A response code of 503 Service Unavailable indicates the deployment does not currently have write quorum.

The healthcheck probe alone cannot determine if the server is offline or processing write operations normally - only whether enough servers are online to meet write quorum requirements based on the configured erasure code parity.

Consider configuring a Prometheus alert using one of the following metrics to detect potential issues or errors on the cluster:

  • minio_cluster_servers_offline_total to alert if one or more servers are offline.
  • minio_server_drive_free_bytes to alert if the deployment is running low on free drive space.

Cluster read quorum

Use the following endpoint to test if an AIStor deployment has read quorum:

curl -I https://aistor.example.net:9000/minio/health/cluster/read

Replace https://aistor.example.net:9000 with the DNS hostname and port of a server in the deployment to check. For clusters using a load balancer to manage incoming connections, specify the hostname for the load balancer.

A response code of 200 OK indicates that the deployment has sufficient servers online to meet read quorum. A response code of 503 Service Unavailable indicates the deployment does not currently have read quorum.

The healthcheck probe alone cannot determine if the server is offline or processing read operations normally - only whether enough servers are online to meet read quorum requirements based on the configured erasure code parity. Consider configuring a Prometheus alert using the minio_cluster_servers_offline_total metric to detect whether one or more servers are offline.

Cluster maintenance check

Use the following endpoint to test if an AIStor deployment can maintain both read and write if the specified server is taken down for maintenance:

curl -I https://aistor.example.net:9000/minio/health/cluster?maintenance=true

Replace https://aistor.example.net:9000 with the DNS hostname and port of a server in the deployment to check. For clusters using a load balancer to manage incoming connections, specify the hostname for the load balancer.

A response code of 200 OK indicates that the deployment has sufficient servers online to meet write quorum. A response code of 412 Precondition Failed indicates the deployment will lose quorum if the server goes offline.

The healthcheck probe alone cannot determine if the server is offline - only whether enough servers will be online after taking the server down for maintenance to meet read and write quorum requirements based on the configured erasure code parity. Consider configuring a Prometheus alert using the minio_cluster_servers_offline_total metric to detect whether one or more servers are offline.

All rights reserved 2024-Present, MinIO, Inc.