Key Manager recovery on Linux
AIStor Key Manager supports recovery from both single-node failures and total cluster failure.
Single node failure and recovery
For single-node failures, Key Manager requires at least one healthy node remaining in the cluster. You can restore any number of failed nodes from a single healthy node, so long as that node remains accessible until total cluster recovery.
This procedure requires running commands as the root
user.
The following steps remove a failed node from the cluster:
-
Shut down the failed node and delete all state
If the node has completely failed with data loss you can skip to the next step.
Removal of state requires deleting all data at the Key Manager storage path. You do not have to remove any configuration files, certificates, or other resources used by the
minkms
process. -
Remove the failed node from the cluster
Use the
minkms edit
command to force removing the node from the cluster. You can useminkms ls
against a healthy node in the cluster to retrieve the list of node IDs.The following command removes a node with
NODE-ID
from a cluster where the specified hostkeymanager1.example.net
remains healthy and available to process operations.export MINIO_KMS_API_KEY=k1:ROOT_API_KEY minkms edit https://keymanager1.example.net:7373 --rm NODE-ID
Do not remove more than one failed node at a time with the
minkms edit
command. -
Restart the failed node with fresh state
Restart the
minkms
process using a copy of the configuration file from an existing healthy node.If the node lost all data, you can follow the installation procedure to reinstall
minkms
and prepare the process to run. -
Add the node back to the cluster
Use the
minkms add
command to re-join the node to the cluster. The following example re-adds a new node atkeymanager2.example.net
to the cluster using a healthy nodekeymanager1.example.net
:minkms add https://keymanager1.example.net:7373 --api-key k1:ROOT_API_KEY https://keymanager2.example.net:7373
-
Monitor the cluster state
Use the
minkms stat
command to monitor the cluster state and ensure the node rejoins successfully.
Total cluster failure and recovery
You can rebuild a Key Manager cluster from a backup in the event of hardware failure, disaster, or other business continuity event. Key Manager requires creating a new single-node cluster to which you restore the backup snapshot. Once the node successfully starts up and resumes operations, you can scale the cluster back up to the target size.
This procedure requires running commands as the root
user.
The following steps rebuild a cluster from a backup snapshot:
-
Start up
minkms
on a new hostFollow the installation procedure to install and start
minkms
on a new host. Ensure that you use the same configuration options as the original cluster, including the same HSM keys used when creating the backup.Once the
minkms
node is online and available, proceed to the next step. -
Restore the backup snapshot
Use the
minkms restore
command to restore the backup snapshot to the new node. You can use the inline CLI helpminkms help restore
for additional usage and guidance.The following example targets a new host
keymanager1.example.net
and restores from a snapshotBACKUP-FILE
:minkms restore https://keymanager1.example.net:7373 --api-key ROOT-API-KEY BACKUP-FILE
Once the backup completes, verify the state of the cluster by running the following commands:
minkms stat
to validate node status.minkms ls-enclave
to validate all expected enclaves.minkms ls-key
to validate all expected cryptographic keys per enclave.minkms ls-policy
to validate all expected policies.
-
Scale the cluster back to production size
Follow the scaling procedure to add new nodes to the cluster. Ensure each new node has a clean state before adding it to the existing cluster.
For example, the following command adds a node at
keymanager2.example.net
to a cluster using a healthy nodekeymanager1.example.net
:minkms add https://keymanager1.example.net:7373 --api-key k1:ROOT_API_KEY https://keymanager2.example.net:7373