Erasure Coding
AIStor Server implements Erasure Coding as a core component in providing data redundancy and availability.
AIStor groups drives in each server pool into one or more Erasure Sets of the same size.
The above example deployment consists of 4 nodes with 4 drives each. AIStor initializes with a single erasure set consisting of all 16 drives across all four nodes.
AIStor determines the optimal number and size of erasure sets when initializing a server pool. You cannot modify these settings after this initial setup.
For each write operation, AIStor partitions the object into data and parity shards.
Data shards contain a portion of a given object. Parity shards contain a mathematical representation of the object used for rebuilding Data shards.
mc admin object info
to output a summary of a specific object’s shards (also called “parts”) on disk.
Erasure set stripe size dictates the maximum possible parity of the deployment. The formula for determining the number of data and parity shards to generate is:
N (ERASURE SET SIZE) = K (DATA) + M (PARITY)
The above example deployment has an erasure set of 16 drives. This can support parity between EC:0
and 1/2 the erasure set drives, or EC:8
.
You can set the parity value between 0 and 1/2 the Erasure Set size.
AIStor uses a Reed-Solomon erasure coding implementation and partitions the object for distribution across an erasure set. The example deployment above has an erasure set size of 16 and a parity of EC:4
Objects written with a given parity settings do not automatically update if you change the parity values later.
AIStor requires a minimum of K
shards of any type to read an object.
The value K
here constitutes the read quorum for the deployment.
The erasure set must therefore have at least K
healthy drives in the erasure set to support read operations.
This deployment has one offline node, resulting in only 12 remaining healthy drives. The object was written with EC:4
with a read quorum of K=12
. This object therefore maintains read quorum and AIStor can reconstruct it for read operations.
AIStor cannot reconstruct an object that has lost read quorum. Such objects may be recovered through other means such as replication resynchronization.
AIStor requires a minimum of K
erasure set drives to write an object.
The value K
here constitutes the write quorum for the deployment.
The erasure set must therefore have at least K
available drives online to support write operations.
This deployment has one offline node, resulting in only 12 remaining healthy drives. A client writes an object with EC:4
parity settings where the erasure set has a write quorum of K=12
. This erasure set maintains write quorum and AIStor can use it for write operations.
If Parity EC:M
is exactly 1/2 the erasure set size, write quorum*s K+1
This prevents a split-brain type scenario, such as one where a network issue isolates exactly half the erasure set drives from the other.
This deployment has two nodes offline due to a transient network failure. A client writes an object with EC:8
parity settings where the erasure set has a write quorum of K=9
. This erasure set has lost write quorum and AIStor cannot use it for write operations.
The K+1
logic ensures that a client could not potentially write the same object twice - once to each “half” of the erasure set.
For an object maintaining read quorum, AIStor can use any data or parity shard to heal damaged shards.
An object with EC:4
lost four data shards out of 12 due to drive failures. Since the object has maintained read quorum, AIStor can heal those lost data shards using the available parity shards.
Use the AIStor Erasure Coding Calculator to explore the possible erasure set size and distributions for your planned topology. Where possible, use an even number of nodes and drives per node to simplify topology planning and conceptualization of drive/erasure-set distribution.
Erasure parity and storage efficiency
Setting the parity for a deployment is a balance between availability and total usable storage. Higher parity values increase resiliency to drive or node failure at the cost of usable storage, while lower parity provides maximum storage with reduced tolerance for drive/node failures. Use the AIStor Erasure Code Calculator to explore the effect of parity on your planned cluster deployment.
The following table lists the outcome of varying erasure code parity levels on a AIStor deployment consisting of 1 node and 16 1TB drives:
Parity | Total Storage | Storage Ratio | Minimum Drives for Read Operations | Minimum Drives for Write Operations |
---|---|---|---|---|
EC: 4 (Default) |
12 Tebibytes | 0.750 | 12 | 12 |
EC: 6 |
10 Tebibytes | 0.625 | 10 | 10 |
EC: 8 |
8 Tebibytes | 0.500 | 8 | 9 |
Summarizing the erasure storage of an object
The AIStor Client can summarize the current status of an object and all of its shards (or “parts”) across its erasure set.
Use mc admin object info
to output a summary of the object and the status of each of its shards, including:
- Part number
- Pool number
- Node
- Erasure set
- Drive
- Filename
- Size