Monitoring and alerting using InfluxDB

AIStor Server publishes cluster and node metrics using the Prometheus Data Model. InfluxDB supports scraping MinIO AIStor metrics data for monitoring and alerting.

The procedure on this page documents the following:

  • Configuring an InfluxDB service to scrape and display metrics from an MinIO AIStor deployment
  • Configuring an Alert on an MinIO AIStor metric

This tutorial uses metrics version 2. You can also use metrics version 3, which is recommened for new deployments. For more information about version 3, see Metrics and alerts.

For MinIO AIStor Deployments on Kubernetes, this procedure assumes all necessary network control components, such as Ingress or Load Balancers, to facilitate access between the object store and the InfluxDB service.

Configure InfluxDB to collect and alert using MinIO AIStor metrics

IMPORTANT

This procedure specifically uses the InfluxDB UI to create a scraping endpoint.

The InfluxDB UI does not provide the same level of configuration as using Telegraf and the corresponding Prometheus plugin. Specifically:

  • You cannot enable authenticated access to the MinIO AIStor metrics endpoint via the InfluxDB UI
  • You cannot set a tag for collected metrics (e.g. url_tag) for uniquely identifying the metrics for a given deployment

The Telegraf Prometheus plugin also supports Kubernetes-specific features, such as scraping the minio service for a given object store.

Configuring Telegraf is out of scope for this procedure. You can use this procedure as general guidance for configuring Telegraf to scrape MinIO AIStor metrics.

  1. Configure public access to MinIO AIStor metrics

    Set the MINIO_PROMETHEUS_AUTH_TYPE environment variable to "public" for all nodes in the MinIO AIStor deployment. You can then restart the deployment to allow public access to the metrics.

    You can validate the change by attempting to curl the metrics endpoint:

    curl https://HOSTNAME/minio/v2/metrics/cluster
    

    Replace HOSTNAME with the URL of the load balancer or reverse proxy through which you access the deployment. You can alternatively specify any single node as HOSTNAME:PORT, specifying the Object Store API port in addition to the node hostname.

    The response body should include a list of collected metrics.

  2. Log into the InfluxDB UI and create a bucket

    Select the Organization under which you want to store MinIO AIStor metrics.

    Create a New Bucket in which to store metrics for the deployment.

  3. Create a new scraping source

    Create a new InfluxDB Scraper.

    Specify the full URL to the MinIO AIStor deployment, including the metrics endpoint:

    https://HOSTNAME/minio/v2/metrics/cluster
    

    Replace HOSTNAME with the URL of the load balancer or reverse proxy through which you access the deployment. You can alternatively specify any single node as HOSTNAME:PORT, specifying the Object Store API port in addition to the node hostname.

  4. Validate the data

    Use the DataExplorer to visualize the collected data.

    For example, you can set a filter on minio_cluster_capacity_usable_total_bytes and minio_cluster_capacity_usable_free_bytes to compare the total usable against total free space on the deployment.

  5. Configure a check

    Create a new Check on a metric.

    The following example check rules provide a baseline of alerts for a deployment. You can modify or otherwise use these examples for guidance in building your own checks.

    • Create a Threshold Check named MINIO_NODE_DOWN.

      Set the filter for the minio_cluster_nodes_offline_total key.

      Set the Thresholds to WARN when the value is greater than 1

    • Create a Threshold Check named MINIO_QUORUM_WARNING.

      Set the filter for the minio_cluster_drive_offline_total key.

      Set the thresholds to CRITICAL when the value is one less than your configured Erasure Code Parity setting.

      For example, a deployment using EC:4 should set this value to 3.

    Configure your Notification endpoints and Notification rules such that checks of each type trigger an appropriate response.