MinIO Erasure Coding is a data redundancy and availability feature that allows MinIO deployments to automatically reconstruct objects on-the-fly despite the loss of multiple drives or nodes in the cluster. Erasure Coding provides object-level healing with less overhead than adjacent technologies such as RAID or replication.
MinIO splits each new object into data and parity blocks, where parity blocks support reconstruction of missing or corrupted data blocks. MinIO writes these blocks to a single erasure set in the deployment. Since erasure set drives are striped across the deployment, a given node typically contains only a portion of data or parity blocks for each object. MinIO can therefore tolerate the loss of multiple drives or nodes in the deployment depending on the configured parity and deployment topology.
At maximum parity, MinIO can tolerate the loss of up to half the drives per
erasure set (
N/2-1) and still perform read and write operations. MinIO
defaults to 4 parity blocks per object with tolerance for the loss of 4 drives
per erasure set. For more complete information on selecting erasure code parity,
see Erasure Code Parity (EC:N).
Erasure coding requires a minimum of 4 drives is only available with distributed MinIO deployments. Erasure coding is is a core requirement for the following MinIO features:
Use the MinIO Erasure Code Calculator when planning and designing your MinIO deployment to explore the effect of erasure code settings on your intended topology.
An Erasure Set is a set of drives in a MinIO deployment that support Erasure Coding. MinIO evenly distributes object data and parity blocks among the drives in the Erasure Set. MinIO randomly and uniformly distributes the data and parity blocks across drives in the erasure set with no overlap. Each unique object has no more than one data or parity block per drive in the set.
MinIO calculates the number and size of Erasure Sets by dividing the total number of drives in the Server Pool into sets consisting of between 4 and 16 drives each. MinIO considers two factors when selecting the Erasure Set size:
The Greatest Common Divisor (GCD) of the total drives.
The number of
minio servernodes in the Server Pool.
For an even number of nodes, MinIO uses the GCD to calculate the Erasure Set size and ensure the minimum number of Erasure Sets possible. For an odd number of nodes, MinIO selects a common denominator that results in an odd number of Erasure Sets to facilitate more uniform distribution of erasure set drives among nodes in the Server Pool.
For example, consider a Server Pool consisting of 4 nodes with 8 drives each for a total of 32 drives. The GCD of 16 produces 2 Erasure Sets of 16 drives each with uniform distribution of erasure set drives across all 4 nodes.
Now consider a Server Pool consisting of 5 nodes with 8 drives each for a total of 40 drives. Using the GCD, MinIO would create 4 erasure sets with 10 drives each. However, this distribution would result in uneven distribution with one node contributing more drives to the Erasure Sets than the others. MinIO instead creates 5 erasure sets with 8 drives each to ensure uniform distribution of Erasure Set drives per Nodes.
MinIO generally recommends maintaining an even number of nodes in a Server Pool to facilitate simplified human calculation of the number and size of Erasure Sets in the Server Pool.
MinIO uses a Reed-Solomon algorithm to split objects into data and parity blocks
based on the Erasure Set size in the deployment.
For a given erasure set of size
M, MinIO splits objects into
M-N data blocks.
MinIO uses the
EC:N notation to refer to the number of parity blocks (
in the deployment. MinIO defaults to
EC:4 or 4 parity blocks per object.
MinIO uses the same
EC:N value for all erasure sets and
server pools in the deployment.
MinIO can tolerate the loss of up to
N drives per erasure set and
continue performing read and write operations (“quorum”). If
N is equal
to exactly 1/2 the drives in the erasure set, MinIO write quorum requires
N+1 drives to avoid data inconsistency (“split-brain”).
Setting the parity for a deployment is a balance between availability and total usable storage. Higher parity values increase resiliency to drive or node failure at the cost of usable storage, while lower parity provides maximum storage with reduced tolerance for drive/node failures. Use the MinIO Erasure Code Calculator to explore the effect of parity on your planned cluster deployment.
The following table lists the outcome of varying erasure code parity levels on a MinIO deployment consisting of 1 node and 16 1TB drives:
Minimum Drives for Read Operations
Minimum Drives for Write Operations
MinIO supports storage classes with Erasure Coding to allow applications to
specify per-object parity. Each storage class specifies
EC:N parity setting to apply to objects created with that class.
MinIO storage classes are distinct from Amazon Web Services storage classes. MinIO storage classes define parity settings per object, while AWS storage classes define storage tiers per object.
MinIO provides the following two storage classes:
STANDARDstorage class is the default class for all objects.
You can configure the
STANDARDstorage class parity using either:
MINIO_STORAGE_CLASS_STANDARDenvironment variable, or
mc admin configcommand to modify the
Starting with RELEASE.2021-01-30T00-20-58Z, MinIO defaults
STANDARDstorage class based on the number of volumes in the Erasure Set:
Erasure Set Size
Default Parity (EC:N)
5 or Fewer
6 - 7
8 or more
The maximum value is half of the total drives in the Erasure Set.
The minimum value is
STANDARDparity must be greater than or equal to
STANDARDparity must be greater than 2
REDUCED_REDUNDANCYstorage class allows creating objects with lower parity than
You can configure the
REDUCED_REDUNDANCYstorage class parity using either:
MINIO_STORAGE_CLASS_RRSenvironment variable, or
mc admin configcommand to modify the
The default value is
REDUCED_REDUNDANCYparity must be less than or equal to
REDUCED_REDUNDANCYmust be less than half of the total drives in the Erasure Set.
REDUCED_REDUNDANCYis not supported for MinIO deployments with 4 or fewer drives.
MinIO references the
x-amz-storage-class header in request metadata for
determining which storage class to assign an object. The specific syntax
or method for setting headers depends on your preferred method for
interfacing with the MinIO server.
mccommand line tool, certain commands include a specific option for setting the storage class. For example, the
mc cpcommand has the
--storage-classoption for specifying the storage class to assign to the object being copied.
For MinIO SDKs, the
S3Clientobject has specific methods for setting request headers. For example, the
S3Client.PutObjectmethod takes a
PutObjectOptionsdata structure as a parameter. The
PutObjectOptionsdata structure includes the
StorageClassoption for specifying the storage class to assign to the object being created.
Silent data corruption or bitrot is a serious problem faced by disk drives resulting in data getting corrupted without the user’s knowledge. The reasons are manifold (ageing drives, current spikes, bugs in disk firmware, phantom writes, misdirected reads/writes, driver errors, accidental overwrites) but the result is the same - compromised data.
MinIO’s optimized implementation of the HighwayHash algorithm ensures that it will never read corrupted data - it captures and heals corrupted objects on the fly. Integrity is ensured from end to end by computing a hash on READ and verifying it on WRITE from the application, across the network and to the memory/drive. The implementation is designed for speed and can achieve hashing speeds over 10 GB/sec on a single core on Intel CPUs.