Scanner
AIStor Server uses the built-in scanner to check objects for healing and to take any scheduled object actions. Such actions may include:
- calculate data usage on drives
- evaluate and apply configured lifecycle management or object retention rules
- perform bucket or site replication
- check objects for missing or corrupted data or parity shards and perform healing
The scanner performs these functions at two levels: cluster and bucket. At the cluster level, the scanner splits all buckets into groups and scans one group of buckets at a time. The scanner starts with any new buckets added since the last scan, then randomizes the scanning of other buckets. The scanner completes checks on all bucket groups before starting over with a new set of scans.
At the bucket level, the scanner groups items in buckets and scans selected items from that bucket. The scanner selects objects for a scan based on a hash of the object name. Over a span of 16 scans, MinIO AIStor checks every object in the namespace. MinIO AIStor fully scans any prefixes known to be new since the last scan.
The scanner waits for 30 seconds after completing the scan of a bucket before proceeding to the next bucket.
Starting with RELEASE.2025-05-14T05-01-13Z, when the scanner encounters an empty bucket, it immediately begins scanning the next bucket.
The scanner deletes dangling prefixes. The prefix is considered dangling if more than (N+1)/2 drives in the erasure set do not have the prefix or if the prefix is empty.
When the scanner encounters a prefix (a path that ends in /) that has no object data at the prefix or at any location under the prefix, the scanner removes the prefix.
Scan length
Multiple factors impact the time it takes for a scan to complete.
Some of these factors include:
- Type of drives provided to the object store
- Throughput and available network
- Number and size of objects
- Other activity on the object store
For example, by default, MinIO AIStor pauses the scanner to make I/O operations available for read and write requests. This can lengthen the time it takes for a scan to complete.
MinIO AIStor waits between each scan by a factor multiplication of the time it takes each scan operation to complete.
By default, the value of this factor is 10.0, meaning the object store waits 10x the length of an operation after one scan completes before starting the next scan.
The value of this factor changes depending on the configured scanner speed setting.
Scanner performance
Many factors impact the scanner performance. Some of these factors include:
- available node resources
- size of the cluster
- number of erasure sets compared to the number of drives
- complexity of bucket hierarchy (objects and prefixes)
For example, a cluster that starts with 100TB of data and then grows to 200TB of data may require more time to scan the entire namespace of buckets and objects given the same hardware and workload. Likewise, a single erasure set of 16 drives takes longer to scan than the same number of drives split into two erasure sets of 8 drives each.
MinIO AIStor treats the scanner as a background task and pauses it in favor of completing read and write requests on the cluster. As the cluster or workload increases, scanner performance decreases as it yields more frequently to ensure priority of normal S3 operations.
You can adjust how MinIO AIStor balances the scanner performance with read/write operations using either the MINIO_SCANNER_SPEED environment variable or the scanner speed configuration setting.
Scanner alerts
The scanner monitors each object and prefix during each scan cycle and emits alerts when counts exceed configured thresholds. These alerts appear as bucket notification events, audit log entries, and scanner error log entries. They do not block reads or writes.
Excess versions per object
The scanner checks the number of stored versions for each object and the total cumulative storage size of all those versions.
Optional
When the version count for an object reaches or exceeds this threshold, the scanner:
- Emits an
s3:Scanner:ManyVersionsbucket notification event. - Writes a
scanner:manyversionsentry to the audit log with anx-minio-versionstag containing the version count.
When the total storage size of all versions for a single object reaches or exceeds 1 TiB (this value is fixed and not configurable), the scanner also:
- Emits an
s3:Scanner:LargeVersionsbucket notification event. - Writes a
scanner:largeversionsentry to the audit log withx-minio-versions-countandx-minio-versions-sizetags.
Objects that accumulate versions without a configured lifecycle management expiration policy are the most common cause of this alert.
Excess subfolders per prefix
The scanner counts total subfolders within each prefix during each scan cycle, including subfolders recorded in the scan cache from prior cycles and those discovered in the current cycle.
Optional
The scanner applies this threshold (T) at three escalating levels:
| Subfolder count | Default at T=50,000 | Behavior |
|---|---|---|
| Greater than T | > 50,000 | Emits s3:Scanner:BigPrefix bucket notification event; writes scanner:manyprefixes audit log entry with an x-minio-prefixes-total tag |
| Greater than or equal to T × 10 | ≥ 500,000 | Additionally logs a scanner error: too many folders in <prefix>: <count> |
| Greater than or equal to T × 100 | ≥ 5,000,000 | Additionally logs a scanner error and skips processing remaining subfolders in that prefix for the current scan cycle |
The scanner records the prefix path and a timestamp in the excessPaths metric whenever the subfolder count exceeds T.
Prefixes that reach the critical level (T × 100) stop having remaining subfolders processed by the scanner during that scan cycle. This may delay healing and lifecycle management for objects under those sub-paths until the following cycle.
Workloads that consistently trigger these alerts should be redesigned to distribute objects across a deeper prefix hierarchy rather than concentrating them under a single flat prefix.
Scanner metrics
AIStor Server provides a number of metrics related to the scanner.
Use mc admin scanner info to see the current status of the scanner and the time since the last full scan.
This can help in understanding the metrics provided by the scanner operation.
Scanner metrics, including usage metrics, reflect the last completed scan.
PUT or DELETE operations since the last scan do not update in the usage until the next scan of the affected bucket(s).
The output resembles the following:
Overall Statistics
------------------
Last full scan time: 0d0h14m; Estimated 2885.28/month
Current cycle: 70464; Started: 2024-04-19 20:02:34.568479139 +0000 UTC
Active drives: 2
Last Minute Statistics
----------------------
Objects Scanned: 620 objects; Avg: 124.929µs; Rate: 892800/day
Versions Scanned: 620 versions; Avg: 2.801µs; Rate: 892800/day
Versions Heal Checked: 0 versions; Avg: 0ms
Read Metadata: 621 objects; Avg: 88.416µs, Size:
ILM checks: 656 versions; Avg: 663ns
Check Replication: 656 versions; Avg: 1.061µs
Verify Deleted: 0 folders; Avg: 0ms
Yield: 3.086s total; Avg: 4.705ms/obj