Share a table with Databricks

This page describes the end-to-end workflow for sharing a MinIO AIStor table with Databricks using the open Delta Sharing protocol. On the AIStor side, you create a share, generate an access token, and download a profile.share profile file. On the Databricks side, you create a provider from that profile and a catalog from the shared data, then query the table.

This procedure assumes you have an existing Delta or Iceberg (UniForm) table in a MinIO AIStor bucket. If you do not, follow the Quickstart to create one first.

Before you begin

Sharing with Databricks depends on three things being correct before any client connects:

  1. The Delta Sharing endpoint points at an externally reachable hostname.
  2. The Databricks client trusts the certificate authority (CA) that signed the AIStor endpoint’s TLS certificate.
  3. The shared tables have been compacted so the client receives accurate table statistics.

The following sections cover each requirement.

Set the Delta Sharing endpoint first

Configure MINIO_DELTA_SHARING_ENDPOINT before you create any token. MinIO AIStor embeds this value in the generated profile.share and in every presigned data-file URL it returns to clients. If you do not set it, MinIO AIStor falls back to its internally detected API endpoint, which is typically not reachable through an ingress, load balancer, or reverse proxy and which fails TLS verification from external clients.

Set the endpoint to the externally reachable hostname that Databricks uses to reach the cluster:

export MINIO_DELTA_SHARING_ENDPOINT="https://aistor-lb.example.net:9000"
MINIO_DELTA_SHARING_ENDPOINT is a server setting that takes effect after a restart, and its value is frozen into profile.share at the time you create the token. Set and apply the endpoint before creating the token. If you create the token first, the profile bakes in the internal endpoint and external clients cannot resolve or validate it.

For complete details on the endpoint setting and other Delta Sharing server settings, see the Delta Sharing settings reference.

Compact tables before sharing

Run an OPTIMIZE or equivalent compaction operation on Delta tables, or compaction on Iceberg (UniForm) tables, before you share them. Compaction consolidates small files and refreshes the table statistics (file counts and sizes) that MinIO AIStor reports to the client. Without current statistics, query planning in the client can stall on large tables while the client attempts to enumerate a large number of small files.

Trust the cluster’s certificate authority

The Delta Sharing client and its underlying pyarrow data-file reader must trust the CA that signed the AIStor endpoint’s TLS certificate. Self-signed or internal-CA certificates that the client does not trust are the most common cause of connection failures.

Add the cluster’s CA certificate to the trust store the client uses. On a Databricks cluster, install the CA certificate into the system trust store and ensure the Python environment uses it, for example through an init script that appends the CA to the bundle referenced by REQUESTS_CA_BUNDLE and SSL_CERT_FILE.

Create the share on AIStor

Use the mc table share create command to create the share.

The following example creates a share analytics-share with the schema HR and a Delta table employees:

mc table share create myaistor/analytics-share HR \
  "employees:delta:analytics:employees/" \
  --description "Analytics for current employees"

To add more tables to the share, see Managing shared tables.

Create a token and download the profile

Use the mc table share token create command to create an access token for the share, then save the profile to profile.share:

mc table share token create myaistor/analytics-share \
  --description "Access to analytics share for Databricks" \
  --json | jq -r '.profile' > profile.share

The resulting profile.share is the profile file Delta Sharing clients use to authenticate to the share:

{
  "shareCredentialsVersion": 1,
  "endpoint": "https://aistor-lb.example.net:9000/_delta-sharing/v1",
  "bearerToken": "dapi..."
}

MinIO AIStor issues dapi-prefixed bearer tokens. The endpoint value is the hostname you set in MINIO_DELTA_SHARING_ENDPOINT.

The command above omits --expires, which creates a non-expiring token. Add --expires with a duration such as 90d to create a token that expires. See Token expiration for guidance on managing token lifetimes.

MinIO AIStor does not distribute profile.share to clients. Transfer the file to Databricks through a secure channel.

Configure the share in Databricks

Databricks consumes the share through the Delta Sharing open sharing model. The high-level flow is:

  1. Upload or register the profile.share profile in Databricks to create a provider from the share credentials.
  2. Create a catalog in Unity Catalog from the shared data the provider exposes.
  3. Query the resulting tables with SQL or a notebook.

The exact steps and screen labels in Databricks change over time, so follow the current Databricks documentation rather than fixed click paths:

If using Databricks as the client, review the Databricks resource limits for Delta Sharing quotas and constraints. The 1,000,000-row cap that some queries hit is a Databricks client limit, not a MinIO AIStor limit.

Production deployment

When Databricks reaches AIStor through a reverse proxy or load balancer, that proxy must be configured to stream responses, and its hostname must match MINIO_DELTA_SHARING_ENDPOINT. See Reverse proxies, load balancers, and scaling for the required configuration.