Erasure Coding

AIStor Server implements Erasure Coding as a core component in providing data redundancy and availability.

The diagrams and content in this section present a simplified view of erasure coding operations and are not intended to represent the complexities of the AIStor Server’s full erasure coding implementation.

AIStor groups drives in each server pool into one or more Erasure Sets of the same size.

A diagram of an erasure set consisting of 16 drives across 4 nodes. — The above example deployment consists of 4 nodes with 4 drives each. AIStor initializes with a single erasure set consisting of all 16 drives across all four nodes.

AIStor determines the optimal number and size of erasure sets when initializing a server pool. You cannot modify these settings after this initial setup.

For each write operation, AIStor partitions the object into data and parity shards.

Data shards contain a portion of a given object. Parity shards contain a mathematical representation of the object used for rebuilding Data shards.

Use mc admin object info to output a summary of a specific object’s shards (also called “parts”) on disk.

Erasure set stripe size dictates the maximum possible parity of the deployment. The formula for determining the number of data and parity shards to generate is:

N (ERASURE SET SIZE) = K (DATA) + M (PARITY)

/images/erasure-coding-possible-parity.svg — The above example deployment has an erasure set of 16 drives. This can support parity between `EC:0` and 1/2 the erasure set drives, or `EC:8`.

You can set the parity value between 0 and 1/2 the Erasure Set size.

Diagram of an erasure set with parity and data shards. — AIStor uses a Reed-Solomon erasure coding implementation and partitions the object for distribution across an erasure set. The example deployment above has an erasure set size of 16 and a parity of `EC:4`

Objects written with a given parity settings do not automatically update if you change the parity values later.

AIStor requires a minimum of K shards of any type to read an object.

The value K here constitutes the read quorum for the deployment. The erasure set must therefore have at least K healthy drives in the erasure set to support read operations.

This deployment has one offline node, resulting in only 12 remaining healthy drives. The object was written with EC:4 with a read quorum of K=12. This object therefore maintains read quorum and AIStor can reconstruct it for read operations. — This deployment has one offline node, resulting in only 12 remaining healthy drives. The object was written with `EC:4` with a read quorum of `K=12`. This object therefore maintains read quorum and AIStor can reconstruct it for read operations.

AIStor cannot reconstruct an object that has lost read quorum. Such objects may be recovered through other means such as replication resynchronization.

AIStor requires a minimum of K erasure set drives to write an object.

The value K here constitutes the write quorum for the deployment. The erasure set must therefore have at least K available drives online to support write operations.

This deployment has one offline node, resulting in only 12 remaining healthy drives. A client writes an object with EC:4 parity settings where the erasure set has a write quorum of K=12. This erasure set maintains write quorum and AIStor can use it for write operations. — This deployment has one offline node, resulting in only 12 remaining healthy drives. A client writes an object with `EC:4` parity settings where the erasure set has a write quorum of `K=12`. This erasure set maintains write quorum and AIStor can use it for write operations.

If Parity EC:M is exactly 1/2 the erasure set size, write quorum*s K+1

This prevents a split-brain type scenario, such as one where a network issue isolates exactly half the erasure set drives from the other.

This deployment has two nodes offline due to a transient network failure. A client writes an object with EC:8 parity settings where the erasure set has a write quorum of K=9. This erasure set has lost write quorum and AIStor cannot use it for write operations. — This deployment has two nodes offline due to a transient network failure. A client writes an object with `EC:8` parity settings where the erasure set has a write quorum of `K=9`. This erasure set has lost write quorum and AIStor cannot use it for write operations.

The K+1 logic ensures that a client could not potentially write the same object twice - once to each “half” of the erasure set.

For an object maintaining read quorum, AIStor can use any data or parity shard to heal damaged shards.

An object with EC:4 lost four data shards out of 12 due to drive failures. Since the object has maintained read quorum, AIStor can heal those lost data shards using the available parity shards. — An object with `EC:4` lost four data shards out of 12 due to drive failures. Since the object has maintained **read quorum**, AIStor can heal those lost data shards using the available parity shards.

Use the AIStor Erasure Coding Calculator to explore the possible erasure set size and distributions for your planned topology. Where possible, use an even number of nodes and drives per node to simplify topology planning and conceptualization of drive/erasure-set distribution.

Erasure parity and storage efficiency

Setting the parity for a deployment is a balance between availability and total usable storage. Higher parity values increase resiliency to drive or node failure at the cost of usable storage, while lower parity provides maximum storage with reduced tolerance for drive/node failures. Use the AIStor Erasure Code Calculator to explore the effect of parity on your planned cluster deployment.

The following table lists the outcome of varying erasure code parity levels on an AIStor deployment consisting of 1 node and 16 1TB drives:

Parity	Total Storage	Storage Ratio	Minimum Drives for Read Operations	Minimum Drives for Write Operations
`EC: 4` (Default)	12 Tebibytes	0.750	12	12
`EC: 6`	10 Tebibytes	0.625	10	10
`EC: 8`	8 Tebibytes	0.500	8	9

Summarizing the erasure storage of an object

The AIStor Client can summarize the current status of an object and all of its shards (or “parts”) across its erasure set. Use mc admin object info to output a summary of the object and the status of each of its shards, including:

Part number
Pool number
Node
Erasure set
Drive
Filename
Size