Tables and Delta Sharing Observability

AIStor exposes metrics, audit-log entries, and alerting hooks for the AIStor Tables (Apache Iceberg REST Catalog) and AIStor Table Sharing (Delta Sharing) APIs. Use these signals to monitor request volume, latency, authentication, caching, and rate limiting for table and share workloads.

This page describes the available metrics, how table and share operations appear in audit logs, and example Prometheus alerting rules.

Metrics

Tables and Delta Sharing publish metrics on both the v3 and v2 metrics endpoints.

Version 3 metrics

The v3 endpoints provide the full catalog of request, error, authentication, caching, and rate-limiting metrics for these APIs.

  • Delta Sharing metrics are available under /minio/metrics/v3/delta-sharing. See Delta Sharing metrics for the complete reference.
  • Tables metrics are available under /minio/metrics/v3/tables. See Tables metrics for the complete reference.

A scrape job using the root /minio/metrics/v3 endpoint captures both sets.

The following v3 metrics are the primary signals for monitoring these APIs:

Name Type Description Labels
minio_delta_sharing_total counter Total number of Delta Sharing API requests processed. name, type, server
minio_delta_sharing_errors_total counter Total number of Delta Sharing API requests that resulted in errors. name, type, server
minio_delta_sharing_4xx_errors_total counter Total number of Delta Sharing API requests that resulted in 4xx errors. name, type, server
minio_delta_sharing_5xx_errors_total counter Total number of Delta Sharing API requests that resulted in 5xx errors. name, type, server
minio_delta_sharing_canceled_total counter Total number of Delta Sharing API requests canceled by the client. name, type, server
minio_delta_sharing_inflight_total gauge Current number of Delta Sharing API requests actively being processed. name, type, server
minio_delta_sharing_auth_success_total counter Total successful Delta Sharing authentications. server
minio_delta_sharing_auth_failures_total counter Total failed Delta Sharing authentications. server
minio_delta_sharing_oauth_tokens_issued_total counter Total OAuth tokens issued for Delta Sharing. server
minio_delta_sharing_cache_hits_total counter Total Delta Sharing cache hits. cache_type, server
minio_delta_sharing_cache_misses_total counter Total Delta Sharing cache misses. cache_type, server
minio_delta_sharing_cache_size gauge Current size of the Delta Sharing cache. cache_type, server
minio_delta_sharing_rate_limited_total counter Total number of rate-limited Delta Sharing requests. server
minio_delta_sharing_requests_ttfb_seconds_distribution counter Histogram distribution of time to first byte for Delta Sharing requests. api, le, server
minio_tables_total counter Total number of Tables API requests processed. name, type, server
minio_tables_5xx_errors_total counter Total number of Tables API requests that resulted in 5xx errors. name, type, server
minio_tables_canceled_total counter Total number of Tables API requests canceled by the client. name, type, server
minio_tables_inflight_total gauge Current number of Tables API requests actively being processed. name, type, server
minio_tables_requests_ttfb_seconds_distribution counter Histogram distribution of time to first byte for Tables requests. api, le, server
Authoritative catalog
The table above lists the most commonly used metrics. The complete and authoritative list, including warehouse, namespace, table, transaction, and recovery gauges, is in the Metrics v3 Reference.

For the per-operation counters and gauges, the name label carries the operation name and the type label is delta-sharing (for Delta Sharing) or the Tables API type. For the cache metrics, the cache_type label is either token or snapshot.

Version 2 metrics

The v2 endpoints expose latency histograms for both APIs. These metrics do not use the minio_ prefix and carry only the api label, which holds the operation name (for example, LoadTable or QueryTable).

Name Type Description Labels
tables_ttfb_seconds histogram Time taken by the Tables/Iceberg API from request read to response write. api
delta_sharing_ttfb_seconds histogram Time taken by the Delta Sharing API from request read to response write. api

As standard Prometheus histograms, each exposes _bucket, _sum, and _count series. See Metrics version 2 for v2 scrape endpoints.

Audit logs

When audit logging is configured, AIStor records an audit event for each Tables and Delta Sharing API call.

  • Tables (Iceberg REST) operations are recorded under the tables subsystem. The operation name identifies the action, for example CreateWarehouse, CreateTable, LoadTable, or QueryTable.
  • Delta Sharing API calls are recorded for each operation, such as ListShares, GetShare, QueryTable, or OAuthToken. Each Delta Sharing audit entry includes the share token identifier in a token field, which you can use to attribute activity to a specific share recipient.

AIStor does not publish audit logs to any destination by default. Configure a Kafka or webhook target to receive these events.

Alerting

The following examples use Prometheus AlertManager rule formatting, consistent with the other alerts in this documentation. Because all of the metrics below are counters, the examples use rate() over a 5-minute window. Tune the thresholds and durations to match your workload baseline.

Delta Sharing authentication failures

This alert triggers if Delta Sharing authentication failures increase beyond the configured threshold. A sustained increase can indicate misconfigured recipients, expired share credentials, or unauthorized access attempts. The auth_failures_total metric carries only the server label, so the rule does not filter by operation.

alert: DeltaSharingAuthFailures
expr: rate(minio_delta_sharing_auth_failures_total[5m]) > 1
for: 2m
labels:
  severity: warning
annotations:
  summary: "Delta Sharing auth failures on {{ $labels.server }}: {{ $value | humanize }}/sec"
  impact: "Recipients may be unable to access shares, or unauthorized access is being attempted."
  action: "Check share credentials and recipient configuration. Review audit logs for the failing tokens."

This alert requires the Prometheus scraping configuration capture the metrics provided in the following API endpoint(s):

  • /minio/metrics/v3/delta-sharing

A scrape job using the root /minio/metrics/v3 endpoint satisfies the above requirement.

Delta Sharing server error rate

This alert triggers if the rate of 5xx errors on the Delta Sharing API increases beyond the configured threshold. 5xx errors indicate that AIStor failed to process a Delta Sharing request. The 5xx_errors_total metric carries the name and type labels, so you can attribute errors to a specific operation.

alert: DeltaSharingServerErrorRate
expr: rate(minio_delta_sharing_5xx_errors_total[5m]) > 1
for: 2m
labels:
  severity: critical
annotations:
  summary: "Delta Sharing 5xx errors for {{ $labels.name }} on {{ $labels.server }}: {{ $value | humanize }}/sec"
  impact: "Server-side failures cause share queries to fail for recipients."
  action: "Check {{ $labels.server }} logs. Investigate backend storage or resource exhaustion."

This alert requires the Prometheus scraping configuration capture the metrics provided in the following API endpoint(s):

  • /minio/metrics/v3/delta-sharing

A scrape job using the root /minio/metrics/v3 endpoint satisfies the above requirement.

Delta Sharing requests rate limited

This alert triggers if Delta Sharing requests are being rate limited. A sustained increase indicates that recipients are exceeding configured request limits, which may degrade their experience. The rate_limited_total metric carries only the server label.

alert: DeltaSharingRateLimited
expr: rate(minio_delta_sharing_rate_limited_total[5m]) > 1
for: 5m
labels:
  severity: warning
annotations:
  summary: "Delta Sharing requests rate limited on {{ $labels.server }}: {{ $value | humanize }}/sec"
  impact: "Recipients are being throttled and may experience failed or delayed share queries."
  action: "Review rate-limit configuration and recipient request patterns."

This alert requires the Prometheus scraping configuration capture the metrics provided in the following API endpoint(s):

  • /minio/metrics/v3/delta-sharing

A scrape job using the root /minio/metrics/v3 endpoint satisfies the above requirement.

Tables server error rate

This alert triggers if the rate of 5xx errors on the Tables (Iceberg REST) API increases beyond the configured threshold. The 5xx_errors_total metric carries the name and type labels, so you can attribute errors to a specific operation such as CreateTable or LoadTable.

alert: TablesServerErrorRate
expr: rate(minio_tables_5xx_errors_total[5m]) > 1
for: 2m
labels:
  severity: critical
annotations:
  summary: "Tables 5xx errors for {{ $labels.name }} on {{ $labels.server }}: {{ $value | humanize }}/sec"
  impact: "Server-side failures cause catalog operations to fail for query engines."
  action: "Check {{ $labels.server }} logs. Investigate backend storage or transaction recovery."

This alert requires the Prometheus scraping configuration capture the metrics provided in the following API endpoint(s):

  • /minio/metrics/v3/tables

A scrape job using the root /minio/metrics/v3 endpoint satisfies the above requirement.