Healing
Healing is AIStor’s ability to restore an object that has been damaged, corrupted, or partially lost. The loss can come from multiple types of corruptions or loss, such as but not limited to:
- drive-level errors or failure
- OS or filesystem errors or failure
- bit rot
Healing and Erasure Coding
AIStor can restore a damaged object depending on the following variables;
-
Total drives in the erasure set the object is part of
-
Available drives that contain intact parts of the object
-
Parity settings for the erasure set
Parity specifies how many data and recovery shards AIStor created when writing an object. AIStor distributes shards randomly across drives in the erasure set such that no one drive contains only parity shards or only data shards.
AIStor can use either data or parity shards to construct an object as long as at least a number of shards equal to the parity setting remain intact. AIStor can then heal the missing data or parity shards while returning the object to the calling client.
Use mc admin object info
to summarize the current state of an object on disk.
This command outputs a summary of all of the shards of an object (also called “parts”), including any that are missing or damaged.
When does AIStor heal an object?
GET
and HEAD
requests
Healing during AIStor automatically checks the consistency of an object’s data shards each time you request an object with a GET
or HEAD
operation.
For versioned buckets, AIStor also checks for consistency during PUT
operations.
If all of the data shards are found intact, AIStor serves the object from the data shards without inspecting the corresponding parity shards.
If the object has missing or damaged data shards, AIStor uses the available parity shards to heal the object before serving it as part of the operation. There must be an intact parity shard available for each lost or damaged data shard, otherwise the object cannot be recovered. If any parity shards is lost or damaged, AIStor restores the parity shard, provided there are sufficient other parity shards to serve the object.
Healing with the object scanner
AIStor uses an object scanner to perform a number of tasks related to objects. One of these tasks checks the integrity of objects and, if found damaged or corrupted, heals them.
On each scanning pass, AIStor uses a hash of the object name to select one out of every 1,024 objects to check.
If any object is found to have lost shards, AIStor heals the object from available shards. By default, AIStor does not check for bit rot corruption using the scanner. This can be an expensive operation to perform and the risk of bit rot across multiple disks is low.
Consult with MinIO Engineers before manually starting a healing process on a deployment.
Tracking healing status
AIStor provides healing metrics under the /cluster/erasure-set
to monitor the status of healing processes on a deployment.
Bit Rot Protection
Bit rot is silent data corruption from random changes at the storage media level. For data drives, it is typically the result of decay of the electrical charge or magnetic orientation that represents the data. These sources can range from the small current spike during a power outage to a random cosmic ray resulting in flipped bits. The resulting “bit rot” can cause subtle errors or corruption on the data medium without triggering monitoring tools or hardware.
AIStor’s optimized implementation of the HighwayHash algorithm ensures that it captures and heals corrupted objects on the fly. Integrity is ensured from end to end by computing a hash on READ and verifying it on WRITE from the application, across the network, and to the memory or drive. The implementation is designed for speed and can achieve hashing speeds over 10 GB/sec on a single core on Intel CPUs.
You can complete a bit rot scan on a specific object by running mc admin object info
with the --bitrot
flag.