Key Manager recovery on Kubernetes
AIStor Key Manager supports recovery from both single-node failures and total cluster failure.
Single node failure and recovery
For single-node failures, Key Manager requires at least one healthy node remaining in the cluster. You can restore any number of failed nodes from a single healthy node, so long as that node remains accessible until total cluster recovery.
On Kubernetes a failed node typically presents as a pod that does not spin up or has lost state due to underlying issues with a Persistent Volume. To restore the pod, you must modify the Helm chart to change the replica configuration and remove the downed pods.
-
Validate the current Helm chart configuration.
Use the
helm get values RELEASE
command to retrieve the user-specifiedvalues.yaml
applied to the chart. You can alternatively reference the actual file if saved or stored in an accessible location.Check the
keyManager.replicas
field:keyManager: # Other configurations omitted replicas: 3
-
Modify the Helm chart to scale down the replica set
Modify the
replicas
value to reflect only the pods still online or healthy in the replica set. Usekubectl get all -n KEY-MANAGER-NAMESPACE
to validate the status before proceeding.keyManager: replicas: 2
-
Update the Helm chart
Use the
helm upgrade
command to apply the modified configuration to the release.helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yaml
Use
kubectl get all -n KEY-MANAGER-NAMESPACE
to validate the status of the pods after updating the chart. Only the healthy pods should remain online and accessible.Use
minkms stat
to ensure the cluster state reflects only the currently healthy nodes. -
Restore the unhealthy worker nodes
Perform the necessary operations to repair the worker nodes and associated storage infrastructure such that Kubernetes can successfully schedule and run Key Manager pods on those nodes.
Check and clean any Persistent Volumes previously used by the Key Manager pods such that they contain no data. Depending on your configured storage class and choice of CSI, you may need to take additional steps to clean and present the Persistent Volumes for use.
-
Scale the replica set to normal size
Restore
values.yaml
to the previous values and update the chart:helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yaml
Use
kubectl get all -n KEY-MANAGER-NAMESPACE
to validate the status of the pods after updating the chart. Only the healthy pods should remain online and accessible.Use
minkms stat
to ensure the cluster state reflects only the currently healthy nodes.
Total cluster failure and recovery
You can rebuild a Key Manager cluster from a backup in the event of hardware failure, disaster, or other business continuity events. Key Manager requires creating a new single-node cluster to which you restore the backup snapshot. Once the node successfully starts up and resumes operations, you can scale the cluster back up to the target size.
In Kubernetes you must first deploy a new Key Manager cluster with a single replica. You can then restore the cluster state and scale up to full size. You must ensure that the Kubernetes cluster has available worker nodes and associated storage to schedule all required Key Manager pods.
-
Validate the current Helm chart configuration.
Use the
helm get values RELEASE
command to retrieve the user-specifiedvalues.yaml
applied to the chart. You can alternatively reference the actual file if saved or stored in an accessible location.Check the
keyManager.replicas
field:keyManager: # Other configurations omitted replicas: 3
-
Modify the Helm chart to scale down the replica set to
0
Modify the
replicas
value to0
to delete all pods and their state: Usekubectl get all -n KEY-MANAGER-NAMESPACE
to validate the status before proceeding.keyManager: replicas: 0
-
Update the Helm chart
Use the
helm upgrade
command to apply the modified configuration to the release.helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yaml
Use
kubectl get all -n KEY-MANAGER-NAMESPACE
to validate the status of the pods after updating the chart. No pods should remain online. -
Restore the unhealthy worker nodes
Perform the necessary operations to repair the worker nodes and associated storage infrastructure such that Kubernetes can successfully schedule and run Key Manager pods on those nodes.
Check and clean any Persistent Volumes previously used by the Key Manager pods such that they contain no data. Depending on your configured storage class and choice of CSI, you may need to take additional steps to clean and present the Persistent Volumes for use.
-
Scale the replica set to
1
Change the
keyManager.replicas
field to1
and update the chart:helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yaml
Use
kubectl get all -n KEY-MANAGER-NAMESPACE
to validate the status of the pods after updating the chart. Only the healthy pods should remain online and accessible.Use
minkms stat
to ensure the cluster state reflects only the currently healthy nodes. -
Restore the backup snapshot
Use the
minkms restore
command to restore the backup snapshot to the new node. You can use the inline CLI helpminkms help restore
for additional usage and guidance.The following example targets a new host
keymanager1.example.net
and restores from a snapshotBACKUP-FILE
. The example assumes the Key Manager cluster includes an ingress, route, or similar configuration that exposes the node or service at the specified hostname:minkms restore https://keymanager1.example.net:7373 --api-key ROOT-API-KEY BACKUP-FILE
Once the backup completes, verify the state of the cluster by running the following commands:
minkms stat
to validate node status.minkms ls-enclave
validate all expected enclaves.minkms ls-key
to validate all expected cryptographic keys per enclave.minkms ls-policy
to validate all expected policies.
-
Scale the replica set to normal size
Restore
values.yaml
to the previous values and update the chart:helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yaml
Use kubectl get all -n KEY-MANAGER-NAMESPACE
to validate the status of the pods after updating the chart.
Only the healthy pods should remain online and accessible.
Use minkms stat
to ensure the cluster state reflects only the currently healthy nodes.