Key Manager recovery on Kubernetes
AIStor Key Manager supports recovery from both single-node failures and total cluster failure.
Single node failure and recovery
For single-node failures, Key Manager requires at least one healthy node remaining in the cluster. You can restore any number of failed nodes from a single healthy node, so long as that node remains accessible until total cluster recovery.
On Kubernetes a failed node typically presents as a pod that does not spin up or has lost state due to underlying issues with a Persistent Volume. To restore the pod, you must modify the Helm chart to change the replica configuration and remove the downed pods.
- 
Validate the current Helm chart configuration. Use the helm get values RELEASEcommand to retrieve the user-specifiedvalues.yamlapplied to the chart. You can alternatively reference the actual file if saved or stored in an accessible location.Check the keyManager.replicasfield:keyManager: # Other configurations omitted replicas: 3
- 
Modify the Helm chart to scale down the replica set Modify the replicasvalue to reflect only the pods still online or healthy in the replica set. Usekubectl get all -n KEY-MANAGER-NAMESPACEto validate the status before proceeding.keyManager: replicas: 2
- 
Update the Helm chart Use the helm upgradecommand to apply the modified configuration to the release.helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yamlUse kubectl get all -n KEY-MANAGER-NAMESPACEto validate the status of the pods after updating the chart. Only the healthy pods should remain online and accessible.Use minkms statto ensure the cluster state reflects only the currently healthy nodes.
- 
Restore the unhealthy worker nodes Perform the necessary operations to repair the worker nodes and associated storage infrastructure such that Kubernetes can successfully schedule and run Key Manager pods on those nodes. Check and clean any Persistent Volumes previously used by the Key Manager pods such that they contain no data. Depending on your configured storage class and choice of CSI, you may need to take additional steps to clean and present the Persistent Volumes for use. 
- 
Scale the replica set to normal size Restore values.yamlto the previous values and update the chart:helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yamlUse kubectl get all -n KEY-MANAGER-NAMESPACEto validate the status of the pods after updating the chart. Only the healthy pods should remain online and accessible.Use minkms statto ensure the cluster state reflects only the currently healthy nodes.
Total cluster failure and recovery
You can rebuild a Key Manager cluster from a backup in the event of hardware failure, disaster, or other business continuity events. Key Manager requires creating a new single-node cluster to which you restore the backup snapshot. Once the node successfully starts up and resumes operations, you can scale the cluster back up to the target size.
In Kubernetes you must first deploy a new Key Manager cluster with a single replica. You can then restore the cluster state and scale up to full size. You must ensure that the Kubernetes cluster has available worker nodes and associated storage to schedule all required Key Manager pods.
- 
Validate the current Helm chart configuration. Use the helm get values RELEASEcommand to retrieve the user-specifiedvalues.yamlapplied to the chart. You can alternatively reference the actual file if saved or stored in an accessible location.Check the keyManager.replicasfield:keyManager: # Other configurations omitted replicas: 3
- 
Modify the Helm chart to scale down the replica set to 0Modify the replicasvalue to0to delete all pods and their state: Usekubectl get all -n KEY-MANAGER-NAMESPACEto validate the status before proceeding.keyManager: replicas: 0
- 
Update the Helm chart Use the helm upgradecommand to apply the modified configuration to the release.helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yamlUse kubectl get all -n KEY-MANAGER-NAMESPACEto validate the status of the pods after updating the chart. No pods should remain online.
- 
Restore the unhealthy worker nodes Perform the necessary operations to repair the worker nodes and associated storage infrastructure such that Kubernetes can successfully schedule and run Key Manager pods on those nodes. Check and clean any Persistent Volumes previously used by the Key Manager pods such that they contain no data. Depending on your configured storage class and choice of CSI, you may need to take additional steps to clean and present the Persistent Volumes for use. 
- 
Scale the replica set to 1Change the keyManager.replicasfield to1and update the chart:helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yamlUse kubectl get all -n KEY-MANAGER-NAMESPACEto validate the status of the pods after updating the chart. Only the healthy pods should remain online and accessible.Use minkms statto ensure the cluster state reflects only the currently healthy nodes.
- 
Restore the backup snapshot Use the minkms restorecommand to restore the backup snapshot to the new node. You can use the inline CLI helpminkms help restorefor additional usage and guidance.The following example targets a new host keymanager1.example.netand restores from a snapshotBACKUP-FILE. The example assumes the Key Manager cluster includes an ingress, route, or similar configuration that exposes the node or service at the specified hostname:minkms restore https://keymanager1.example.net:7373 --api-key ROOT-API-KEY BACKUP-FILEOnce the backup completes, verify the state of the cluster by running the following commands: - minkms statto validate node status.
- minkms ls-enclavevalidate all expected enclaves.
- minkms ls-keyto validate all expected cryptographic keys per enclave.
- minkms ls-policyto validate all expected policies.
 
- 
Scale the replica set to normal size Restore values.yamlto the previous values and update the chart:helm upgrade RELEASE minio/aistor-keymanager \ -n KEY-MANAGER-NAMESPACE \ -f aistor-keymanager-values.yaml
Use kubectl get all -n KEY-MANAGER-NAMESPACE to validate the status of the pods after updating the chart.
Only the healthy pods should remain online and accessible.
Use minkms stat to ensure the cluster state reflects only the currently healthy nodes.