Bucket Inventory Reports
Overview
Starting with RELEASE.2025-12-20T04-58-37Z, AIStor supports creating inventory reports on objects and related metadata in a bucket. The AIStor inventory feature provides a fully-integrated solution with equivalent utility to the previously announced AIStor Catalog while expanding functionality with improved scheduling, filtering, and integration options.
Each inventory configuration is bucket-scoped and user-defined, allowing you to create multiple inventory jobs per bucket with different filters, schedules, and output formats. You can schedule jobs to run once, hourly, daily, weekly, monthly, or yearly, with each execution creating a timestamped output folder.
Inventory reports include object metadata such as size, last modified date, storage class, encryption status, tags, and user metadata. You can filter objects by prefix, age, size, name patterns, tags, or custom metadata to generate targeted reports. The inventory system supports CSV, JSON, and Parquet output formats with optional compression, making it suitable for compliance reporting, data analytics, and integration with downstream systems.
Use the mc inventory commands to create and manage inventory jobs.
Quick start
Before creating an inventory job, ensure you have the following:
s3:PutInventoryConfigurationpermission on the source buckets3:GetInventoryConfigurationpermission on the source buckets3:ListBucketpermission on the source bucket- Write permissions on the destination bucket
- An alias configured for your AIStor deployment (for example,
myaistor)
-
Generate configuration template
Use
mc inventory generateto create a YAML configuration template that serves as the starting point for your inventory job.mc inventory generate ALIAS/SOURCE_BUCKET INVENTORY_JOB_ID > inventory-config.yamlReplace
ALIASwith your AIStor alias,SOURCE_BUCKETwith the bucket to inventory, andINVENTORY_JOB_IDwith a unique identifier for this job.The command creates a YAML file with all available configuration options and comments explaining each field. See the configuration reference for more complete documentation on available fields.
-
Edit the job configuration
Open the generated
inventory-config.yamlfile and configure the job to reflect your desired outcome. The following example generates a daily CSV report on the current version of all objects in theSOURCE_BUCKETand outputs it as acsvto theinventory-reports/documents-inventorybucket and prefix:apiVersion: v1 id: daily-report destination: bucket: inventory-reports prefix: documents-inventory/ format: csv compression: on schedule: daily mode: fast versions: currentSave your changes to the configuration file.
-
Create the inventory job
Use [
mc inventory put](/enterprise/aistor-object-store/reference/cli/mc-inventory/mc-inventory-put/ to upload the configuration and add the inventory job for your AIStor deployment.mc inventory put ALIAS/SOURCE_BUCKET inventory-config.yamlAIStor validates the configuration and schedules the job according to your specified schedule. For one-time jobs (the default schedule), the job begins execution immediately. For recurring jobs, the first execution starts based on the schedule type.
-
Monitor job status
Use
mc inventory statusto track progress and completion of the job:mc inventory status ALIAS/SOURCE_BUCKET INVENTORY_JOB_IDThe status output includes:
- The job state
- objects scanned
- records written
- execution time
- errors encountered.
Add the
--watchflag to the command to continuously monitor job progress in real time.
Processing job output
Inventory jobs write output to the destination bucket in a structured folder hierarchy:
DESTINATION_BUCKET/
PREFIX/
SOURCE_BUCKET/
INVENTORY_JOB_ID/
YYYY-MM-DDTHH-MMZ/
files/
file-001.csv.zst
file-002.csv.zst
manifest.json
Each execution creates a timestamped folder containing data files and a manifest. The timestamp reflects when the job started. The manifest contains metadata useful to downstream consumers who want observability into the inventory process.
Manifest file structure
The manifest file provides metadata about the inventory execution and lists all data files produced. It uses the AWS S3 Inventory manifest format with a MinIO extension that includes job status, objects scanned, and objects matched by filters. The manifest also includes MD5 checksums for data files.
{
"sourceBucket": "my-bucket",
"destinationBucket": "dest-bucket",
"version": "2016-11-30",
"creationTimestamp": "1736943600",
"fileFormat": "CSV (ZSTD compressed)",
"fileSchema": "Bucket,Key,Size,LastModifiedDate,...",
"files": [
{"key": "prefix/bucket/job-id/2025-01-15T10-30Z/files/file-001.csv.zst", "size": 1024, "MD5checksum": "abc123"}
],
"minioExtension": {
"status": "completed",
"scannedObjects": 12500,
"matchedObjects": 8300,
"partialResultsAvailable": false
}
}
Use the manifest to programmatically discover and validate inventory output files.
The minioExtension object provides additional details about the inventory execution:
| Key | Value |
|---|---|
status |
Indicates the final state of the job. Possible values: "completed" - Job finished successfully. "canceled" - Job cancelled using mc inventory cancel "suspended" - Job suspended using mc inventory suspend |
scannedObjects |
Total count of objects examined by the inventory job. |
matchedObjects |
Count of objects matching the configured filters and included in the output. |
partialResultsAvailable |
Indicates whether output files contain complete results. - true indicates partial results from a cancelled or suspended job. - false indicates complete results of a completed job. |
Working with data files
Data files contain one row per object for CSV and JSON formats, or use columnar storage for Parquet.
AIStor compresses the files by default using ZSTD compression.
CSV files include field names in the first row.
JSON files use JSON Lines format with one object per line.
Data files follow the naming pattern file-NNN.{format}.{compression}, where:
NNN- a zero-padded sequence number (001, 002, etc.){format}- the configured output format (csv, json, or parquet){compression}- the compression method (zst for ZSTD compression, or no extension when disabling compression)
For example, file-001.csv.zst or file-002.parquet indicates a compressed CSV and uncompressed Parquet file respectively.
Process the data files using standard tools for the chosen format.
For example, use pandas for CSV and Parquet in Python, jq for JSON, or load the files into a data warehouse or analytics platform.
Parquet-formatted files typically integrate directly with tools like Apache Spark, Presto, and Trino.
The default fields in the data files include bucket name, object keys, object size, and last modified data. Some default fields only populate on buckets with versioning enabled. You can modify the report to include optional fields. See the output field list for more complete documentation.
Scheduling jobs
The scheduler process manages all job states, including planning of subsequent runs for recurring jobs, rescheduling failed jobs, and cleaning up after job execution. The scheduler can run on no more than one node in the cluster at a time, while maintaining state within the cluster to allow other nodes to resume the process if a scheduler node fails.
AIStor schedules jobs based on the schedule field in the job configuration.
One-time jobs with schedule : once run once and either complete or fail with no further runs.
Repeating jobs with any other schedule value run repeatedly until suspended or deleted.
The scheduler calculates the next run time for repeating jobs based on when the previous execution completed:
hourly: One hour after the last completiondaily: Next midnight (00:00 UTC) after the last completionweekly: Next Sunday at midnight (00:00 UTC) after the last completionmonthly: First Sunday of the next month at midnight (00:00 UTC) after the last completionyearly: First Sunday of the next year at midnight (00:00 UTC) after the last completion
The scheduler processes each job once regardless of the number of missed runs between the last and current run. For example, suspending a daily job on Monday and resuming it on Friday does not result in AIStor running the missed Tuesday-Thursday runs.
Job states
Each inventory job state indicates its current phase in the scheduling and execution lifecycle. The scheduler transitions jobs between states based on execution progress, scheduling requirements, and control operations.
| State | Description |
|---|---|
| Pending | Waiting for processing by an executor. |
| Running | Actively scanning objects and writing output files. The job remains in this state until completion, failure, or a suspend/cancel operation. |
| Sleeping | Waiting for next scheduled run. This state only applies to repeating jobs. The scheduler transitions the job to Pending at the scheduled time. |
| Completed | Completed without errors. Terminal state for one-time jobs. Repeating jobs transition through Completed and return to Sleeping. |
| Failed | Exceeded the maximum number of retry attempts (3) after encountering errors. Terminal state for one-time jobs. |
| Errored | Failed run with remaining retry attempts. The scheduler marks the job for retry after a 10 minute delay. |
| Canceled | Stopped with the mc inventory cancel command. Terminal state for one-time jobs. For repeating jobs, the scheduler transitions the job as “Pending” at the next scheduled time. |
| Suspended | Paused with the mc inventory suspend command. The job remains in this state until explicitly resumed. |
Use mc inventory status to check a job’s current state and execution details.
Executing jobs
The executor process manages jobs that the scheduler marks as ready to run. The executor runs on every node in the cluster to ensure the parallel processing of jobs, maximizing throughput and minimizing execution time.
The executor typically runs jobs within 15-30 minutes of the scheduler marking the job as pending.
A failed job may successfully write partial output prior to encountering the error.
Similarly, cancelling or suspending a job mid-run may produce partial output.
Check the job manifest partialResultsAvailable field to determine whether the output contains incomplete data.