Replication
AIStor provides multiple replication strategies to ensure data availability, support disaster recovery, enable geographic distribution, and facilitate data migration between clusters. Each replication method serves different use cases and operates at different levels, from entire cluster to individual objects.
Replication methods
AIStor supports four distinct replication and synchronization approaches:
Method | Scope | Timing | Primary Use Case |
---|---|---|---|
Site Replication | Entire cluster | Continuous, real-time | Business continuity, disaster recovery, global distribution |
Bucket Replication | Individual buckets | Continuous, automatic | Cross-region data sync, active-active configurations |
Batch Replication | Selected objects | One-time or scheduled | Data migration, selective sync, recovery operations |
mc mirror | Directories/buckets | One-time or continuous | File sync, backups, client-side control |
Site replication
Site replication creates a cluster of peer sites from multiple independent AIStor clusters. This method provides the most comprehensive replication coverage and supports complete disaster recovery scenarios.
Key differentiators
Site Replication offers the following advantages over other synchronization or replication methods AIStor offers:
- Deployment-wide scope: Synchronize configs and data across multiple clusters with a single command.
- Transparent failover: Applications can seamlessly switch between peer sites.
- Rapid recovery: Use healthy peers to completely restore a site after disaster.
- Geolocal read/writes: Clients can use least-latency logic for selecting an appropriate peer site.
Considerations
Site replication requires all clusters to use the same identity provider, AIStor version, and encryption configuration.
Bucket replication
Bucket replication provides automatic, server-side synchronization between specific buckets across separate AIStor clusters.
Key differentiators
- Granular control: Configure replication rules per bucket with filters and priorities.
- Flexible topologies: Supports one-to-many, many-to-one, and active-active patterns.
- Independent clusters: Source and target can have different configurations.
Considerations
Bucket replication requires separate setup on each origin to destination bucket. To create two-way synchronization, you have to repeat the setup on each side of the replication.
Batch replication
Batch Replication uses a job system for server-side one-time or scheduled bulk transfers of data.
Key differentiators
- Job-based execution: Define, monitor, and manage replication as discrete jobs.
- Cross-platform support: Works between AIStor and any S3-compatible storage.
- Precise filtering: YAML configuration allows complex selection criteria using the cluster’s resources.
Considerations
Batch replication does not run continuously. You must either run the job manually when desired or use job scheduling to specify when to perform the replication.
mc mirror
mc mirror
provides client-side synchronization similar to rsync
, offering immediate control and flexibility for selecting objects to sync to a destination.
Key differentiators
- Client-side execution: Runs from your local machine or CI/CD pipeline.
- Filesystem support: Can sync local directories to object storage.
- Client-side filtering: Fine grained controls to filter objects to sync using the client’s resources.
Considerations
mc mirror
runs on the local client, which limits the performance to what the client has available.
Only the current version of objects synchronize, and without extensive metadata.
If you need a complete object history, use another method.
Common replication scenarios
Hub-and-spoke topology
This type of infrastructure set up casts one cluster as a central hub site with the remaining clusters sending objects to or receiving objects from the central location. If all the spokes need the same information, set up site replication between all clusters. If the spokes need to share only subsets of the central data, set up bucket replication.
Active-active dual-site
For two clusters that need the same set of data, use either site or bucket replication. Site replication automatically keeps both sites in sync with each other. Bucket replication allows you to specify which buckets to sync. For two way syncing with bucket replication, set up the replication on both sites back to the other.
Edge-to-core pattern
This scenario involves transferring data from clusters at the far reaches of your reach back to a central data center. Use batch replication to move the data when the edge cluster devices have network access to the central data center. Bucket replication can also work for maintaining sync of data between edge and core clusters.
Multi-region disaster recovery
Site replication of geographically dispersed data centers can provide some assurance in case any data center goes down in one region. For greater distances, use asynchronous replication to better accommodate higher latencies.
For more specifics on such a scenario, see the page on recovering after a site failure.
Development and backup patterns
When you are looking to back up local development data, use mc mirror
.
The command offers flags to help control what syncs between the local and backup locations, allowing you to exclude temporary folders or replace existing artifacts with the latest versions.
Considerations
Network and latency
Replication performance depends heavily on network bandwidth and latency between locations. High latency between sites can result in replication lag, particularly for synchronous replication modes. Plan your network topology and replication architecture accordingly.
Access control
All replication methods require appropriate read and write permissions for the source and destination locations.