Healthcheck Probes
Each AIStor server process exposes unauthenticated endpoints for probing server uptime and deployment high availability for simple healthchecks. These endpoints return an HTTP status code indicating whether the underlying resource is healthy or satisfies read/write quorum. The server exposes no other data through these endpoints.
AIStor liveness
Use the following endpoint to test if the specified AIStor server is up and ready to serve requests:
curl -I https://aistor.example.net:9000/minio/health/live
Replace https://aistor.example.net:9000 with the DNS hostname and port of the server to check.
A response code of 200 OK indicates the server is online and functional.
Any other HTTP codes indicate an issue with reaching the server, such as a transient network issue or potential downtime.
Use this endpoint with load balancer healthcheck probes to ensure that client operations route only to healthy nodes.
The healthcheck probe alone cannot determine if the server is offline - only that the current host machine cannot reach the server.
Consider configuring a Prometheus alert using the minio_cluster_servers_offline_total metric to detect whether one or more AIStor Server servers are offline.
Cluster write readiness
Use the following endpoint to check if the local AIStor node views the cluster as ‘ready’ to process write operations:
curl -I https://aistor.example.net:9000/minio/health/cluster
Replace https://aistor.example.net:9000 with the DNS hostname and port of any server in the deployment to check.
For clusters using a load balancer to manage incoming connections, specify the hostname of the load balancer.
The target node queries its peers for their current drive health and status. A ‘ready’ response indicates that enough peer nodes responded as fully initialized with sufficient healthy drives to support write quorum.
This endpoint alone cannot determine the uptime status of the target node or its peers. For detecting and alerting on node downtime, configure Prometheus alerts using one of the following V3 metrics to detect potential issues or errors on the cluster:
minio_cluster_servers_offline_totalto alert if one or more servers are offline.minio_server_drive_free_bytesto alert if the deployment is running low on free drive space.
Distributed readiness check
Use the distributed=true query parameter to verify cluster health from all nodes’ perspectives:
curl -I "https://aistor.example.net:9000/minio/health/cluster?distributed=true"
A distributed readiness check performs a fan-out call to all peer nodes requesting they each perform their own health check. The response then indicates whether all peer nodes agree on cluster readiness, instead of relying on only the single local node’s view.
Combine distributed=true with maintenance=true to verify if a specific node can be safely taken offline while ensuring all other nodes see the cluster as healthy:
curl -I "https://aistor.example.net:9000/minio/health/cluster?distributed=true&maintenance=true"
Response codes and headers
The endpoint returns one of the following HTTP codes:
| HTTP Code | Description |
|---|---|
200 OK |
Sufficient online nodes and healthy drives for write operations |
503 Service Unavailable |
Insufficient online nodes or healthy drives for write operations. |
The response includes the following headers:
| Header | Description |
|---|---|
X-Minio-Write-Quorum |
Number of drives required to satisfy write quorum |
X-Minio-Storage-Class-Defaults |
true if using default storage class settings |
X-Minio-Healing-Drives |
Number of drives currently healing (only present if greater than 0) |
X-Minio-Server-Status |
Reason for failure or degraded state (see Understanding 503 responses) |
The result of the probe alone does not determine the health of the target server or the cluster’s ability to process operations. It only indicates the local view of the cluster health and status from the targeted node. For example, if the node has a network partition between a peer, the return status reflects that peer as ‘offline’ even if it otherwise is healthy and servicing requests. Similarly, the node may not have fully initialized during the healthcheck while the remainder of the cluster is healthy and operational.
Cluster read readiness
Use the following endpoint to check if the local AIStor node views the cluster as ‘ready’ to process read operations:
curl -I https://aistor.example.net:9000/minio/health/cluster/read
Replace https://aistor.example.net:9000 with the DNS hostname and port of a server in the deployment to check.
For clusters using a load balancer to manage incoming connections, specify the hostname of the load balancer.
The target node queries its peers for their current drive health and status. A ‘ready’ response indicates that enough peer nodes responded as fully initialized with healthy drives to support read quorum.
This endpoint alone cannot determine the uptime status of any given peer node. For detecting and alerting on node downtime, configure Prometheus alerts using one of the following metrics to detect potential issues or errors on the cluster:
minio_cluster_servers_offline_totalto alert if one or more servers are offline.minio_server_drive_free_bytesto alert if the deployment is running low on free drive space.
Response codes and headers
The endpoint returns one of the following HTTP codes
| HTTP Code | Description |
|---|---|
200 OK |
Sufficient online nodes and healthy drives for read operations |
503 Service Unavailable |
Insufficient online nodes or healthy drives for read operations |
The response includes the following headers:
| Header | Description |
|---|---|
X-Minio-Read-Quorum |
Number of drives required to satisfy read quorum |
X-Minio-Storage-Class-Defaults |
true if using default storage class settings |
X-Minio-Healing-Drives |
Number of drives currently healing (only present if greater than 0) |
X-Minio-Server-Status |
Reason for failure or degraded state (see Understanding 503 responses) |
The result of the probe alone does not determine the health of the target server or the cluster’s ability to process operations. It only indicates the local view of the cluster health and status from the targeted node. For example, if the node has a network partition between a peer, the return status reflects that peer as ‘offline’ even if it otherwise is healthy and servicing requests. Similarly, the node may not have fully initialized during the healthcheck while the remainder of the cluster is healthy and operational.
Cluster maintenance check
Use the following endpoint to test if an AIStor deployment can maintain both read and write quorum if the target node is taken down for maintenance:
curl -I https://aistor.example.net:9000/minio/health/cluster?maintenance=true
Replace https://aistor.example.net:9000 with the DNS hostname and port of a server in the deployment to check.
Response codes
| HTTP Code | Description |
|---|---|
200 OK |
Deployment can maintain quorum if this server goes offline |
412 Precondition Failed |
Returned only when maintenance=true. Indicates the deployment will lose quorum if this server goes offline. |
503 Service Unavailable |
Server is unavailable (check X-Minio-Server-Status header for the reason) |
The response alone does not indicate the health or availability of the node. It only indicates whether the cluster can tolerate taking the node offline for maintenance operations.
Understanding 503 responses
The cluster healthcheck endpoints return 503 Service Unavailable if the target server could not verify the health status of the cluster.
This includes scenarios in which the target server has not fully initialized or has encountered an error preventing startup.
The contents of the X-Minio-Server-Status response header contains the specific reason for the failure.
Some failures may clear given sufficient time for the server to start up or retry startup operations.
If the status persists beyond the normal startup time, check the server logs for errors.
The following table lists status values returned as part of the X-Minio-Server-Status header:
| Status | Cause | Resolution |
|---|---|---|
offline |
Server is starting up, restarting, or failed to start due to configuration errors. | Wait for startup to complete. If it persists, check server logs and verify drive mounts. |
bucket-metadata-offline |
Server is loading bucket metadata, or drives containing metadata are unavailable. | Wait for metadata loading. Check drive health and server logs for metadata errors. |
iam-offline |
Server is loading IAM policies, or IAM data is being synchronized across the cluster. | Wait for IAM initialization. Check server logs for IAM-related errors. |
restarting |
Server is restarting. | Wait for server to restart and complete initialization. Check server logs for errors. |
license-offline |
Server failed license check. | Ensure license is up-to-date, valid, and associated to an active SUBNET account. |
license-readonly |
License has expired. | Update the license for the node. |
grid-offline |
Server networking layer not initialized, or internal RPC port blocked by firewall. | Wait for startup to complete. Verify firewall rules allow inter-node traffic. |
grid-none-online |
Network partition, all other nodes offline, or DNS resolution failures. | Verify network connectivity between nodes. Check that peer nodes are running and DNS resolves correctly. |
For readiness queries with the distributed flag, the header supports the following additional values:
| Status | Cause | Resolution |
|---|---|---|
peer-unreachable:{HOST} |
The target server could not connect to the specified {HOST}. |
Check network connectivity between the target and the peer. |
peer-no-response:{HOST} |
The target server did not receive a response from the specified {HOST}. |
Check logs of remote peer for errors or issues which would prevent timely response. |
peer-unhealthy:{HOST} |
The {HOST} reported itself as unhealthy. |
Check logs of remote peer to determine source of health issues. |
distributed-fanout-failed:{ERROR} |
The RPC fanout query to peer nodes failed with {ERROR} |
Check network connectivity between the target and its peers. |
Quorum failures
When a 503 response has no X-Minio-Server-Status header, the cluster does not have sufficient drives online to meet quorum requirements.
This occurs when too many drives are offline due to hardware failures, maintenance, or unavailable mount points.
Use mc admin info to identify offline drives.
The X-Minio-Healing-Drives response header indicates if healing is in progress on replacement drives.
The number of drives required for quorum depends on the erasure code parity configuration.