Node Maintenance

Cordon is not always necessary
AIStor will correctly remove offline nodes from the cluster automatically. There are normally no additional steps required when taking a node offline for maintenance: just stop the service, perform maintenance and then restart it. For users with certain special requirements, we offer node cordoning as an option.

MinIO AIStor allows you to temporarily remove nodes from active service for planned maintenance operations. Removing nodes allows administrators to gracefully take nodes offline without disrupting cluster operations. A cordoned node finishes in-flight operations and marks itself as unavailable for any other operation.

Cordon should only be used when you have a specific requirement that cordon satisfies: for example, you want the cordoned node to drain external connections, or you are encountering a situation where a single node is hanging or slow, due to infrastructure issues, and you want the rest of the cluster to disregard it.

The following diagram illustrates the node state transitions during the maintenance workflow:

O n l i n e m c a d m i n m c m u c n a - c d a n o m d o r i m - d n i d o n r n c a o c i r o n d r o d n o n ( s R t D C e i r o s l a r t l i d a n o r c i n t o n D c e M r e r g r o d a e d d a m n s o i p u t n n l a a e e l r d t t ) e s

Cordon a node

The mc admin cordon command removes a node from active service. By default, the command initiates a graceful drain of existing connections before fully cordoning the node.

Replace ALIAS with your MinIO AIStor cluster alias and NODE with the target node address (for example, node1.example.com:9000).

Run cordon from a different node
Do not run mc admin cordon while your mc alias is pointed directly at the node you are cordoning. Point the alias at the cluster through a load balancer, or at a node other than the one being cordoned.

Connection management when cordoning a node

When you cordon a node, MinIO AIStor performs a graceful drain of existing connections:

  • The node enters a draining state.
  • The health endpoint returns HTTP 503, preventing new client requests from routing to this node.
  • MinIO AIStor waits up to two minutes to allow existing connections to complete.
Correct draining behavior requires a load balancer
The draining node does not refuse connections, it relies on the load balancer to correctly stop directing traffic to it. The load balancer must be configured to monitor each node’s healthcheck and remove those that fail from the rotation, promptly.

After draining completes, the node transitions to a fully cordoned state. MinIO AIStor disconnects all grid connections.

To cordon a node immediately, you can skip the drain phase using the --no-drain flag:

mc admin cordon --no-drain ALIAS NODE
Immediate cordon
Using --no-drain immediately terminates all in-progress requests to the node. Use this option only when you need to quickly isolate a node and can accept potential request failures.

Monitor node status

Use mc admin info to view the status of nodes in your cluster, including cordoned and draining nodes:

mc admin info ALIAS

Nodes display one of the following states:

State Description
Online Node is operational and serving requests.
Draining Node is completing existing requests before cordoning.
Cordoned Node is offline for maintenance.
Offline Node is not responding.

Uncordon a node

Use the mc admin uncordon command to direct the node to return to active service.

You must manually restart the MinIO AIStor process on a cordoned node, such as by running sudo systemctl restart minio on the node. Restarting alone is not enough: the cordon state is persisted, and re-applied on every startup, so the node stays cordoned until you explicitly run mc admin uncordon against the cluster from a node other than the cordoned one. The restart and the uncordon may occur in either order; both are required.

If you run mc admin uncordon before restarting the node, the node continues to display as Offline in mc admin info until the process is restarted and the grid connection is re-established.

Behavior

State persistence

MinIO AIStor persists the cordon state to storage. If a draining or cordoned node restarts before being uncordoned, it automatically re-enters the cordoned state. A draining node that restarts transitions directly to the fully cordoned state.

This behavior ensures that nodes do not accidentally rejoin the cluster during maintenance windows.

Quorum protection

Before allowing a cordon or drain operation, MinIO AIStor validates that the operation does not cause the cluster to lose quorum. If cordoning the node would reduce the cluster below the minimum required nodes for read and write operations, the command fails with an error similar to the following:

cluster would lose quorum

For clusters operating near minimum quorum, verify the impact of taking a node offline before cordoning. Use mc admin info to review current cluster health and capacity.

Maintenance mode health checks

MinIO AIStor health endpoints support a maintenance query parameter for load balancer integration. When you call the cluster health endpoint with ?maintenance=true, MinIO AIStor checks whether removing the calling node would compromise high availability:

GET /minio/health/cluster?maintenance=true
  • Returns HTTP 200 if the node can safely be taken offline without losing write quorum.
  • Returns HTTP 412 if removing the node would cause the cluster to lose quorum.

Use this endpoint in rolling maintenance scripts to verify each node can safely go offline before cordoning it.

Most Recently Failed (MRF) queue

MinIO AIStor maintains a Most Recently Failed (MRF) queue that tracks operations that could not be completed due to transient failures such as a drive being temporarily unavailable. The MRF system periodically reprocesses these failed operations, ensuring data consistency is restored once the underlying issue resolves.

The MRF queue works alongside healing to maintain data integrity across the cluster. Configure heal background workers to control MRF processing concurrency.

Kubernetes considerations

MinIO AIStor cordoning on Kubernetes
In Kubernetes environments, the MinIO AIStor cordoning function applies to the Pod running the associated workload. It does not affect any other Pods or services running on the Kubernetes worker, and acts to ensure the scheduler does not reschedule that Pod during ongoing maintenance operations.

When running MinIO AIStor on Kubernetes, the cordon workflow requires additional considerations for Pod lifecycle management.

MinIO AIStor cordon vs Kubernetes cordon

The mc admin cordon command operates at the MinIO AIStor application layer, not the Kubernetes node layer. It removes an MinIO AIStor Pod from cluster participation while the Pod continues running. This differs from kubectl cordon, which prevents new Pods from scheduling on a Kubernetes node.

For MinIO AIStor maintenance, use mc admin cordon to gracefully stop activity on a Pod on the MinIO AIStor cluster before performing maintenance on the underlying infrastructure.

Maintenance workflows

Pod identity
StatefulSets maintain stable network identities for Pods. When a Pod restarts, it retains the same hostname and PersistentVolumeClaims, so the node address used with mc admin cordon and mc admin uncordon remains the same.