Back to Blog
MinIOObject StorageS3KubernetesData LakeInfrastructureDevOpsOpen SourceCloud StorageData Engineering

MinIO in Production — S3-Compatible Object Storage, Tiering, and Kubernetes Deployment

A practical guide to MinIO in production: single-node deployment with erasure coding across four XFS-formatted NVMe drives, the systemd unit file and MINIO_VOLUMES environment configuration, distributed multi-node server pool setup with the {1...4} expansion syntax across four nodes and sixteen drives with automatic erasure set sizing, Nginx load balancer upstream blocks for API and console endpoints with ip_hash sticky sessions for the web console, TLS configuration with openssl-generated wildcard certificates placed in ~/.minio/certs/ for automatic HTTPS, IAM policy creation with mc admin policy create for read-only analyst access and pipeline writer service accounts with scoped S3 Action lists, service account creation with mc admin user add and policy attachment, lifecycle policies with mc ilm rule add for prefix-scoped object expiry and non-current version expiry on log archive buckets, bucket versioning with mc version enable required before enabling tiering, bucket notification configuration with mc event add targeting Kafka and NATS for object creation events, server-side tiering with mc ilm tier add pointing at AWS S3 GCS or Azure Blob for transparent cold object migration without API path changes, tier statistics and mc restore for on-demand cold object retrieval, MinIO Operator Helm chart installation and Tenant CRD configuration with pool servers volumesPerServer NVMe StorageClass PVC requests topologySpreadConstraints for cross-node pod distribution and cert-manager TLS secret reference, Kubernetes Secret with config.env for root credentials and MINIO_STORAGE_CLASS_STANDARD EC:4 configuration, boto3 integration with endpoint_url signature_version s3v4 custom CA verification and TransferConfig multipart upload for large files, Apache Spark S3A connector configuration with fs.s3a.endpoint path.style.access multipart.size threads.max and fast.upload settings for local network performance, Prometheus scrape config targeting the /minio/v2/metrics/cluster endpoint with PrometheusRule alerts for offline drives low disk space high error rate and replication lag, and a 10-point production checklist covering XFS drive formatting, EC:4 erasure coding, root credential rotation, TLS enforcement, bucket versioning before tiering, S3A connector tuning, Kubernetes topology spread constraints, cross-site active-active replication, capacity alerting at 80% threshold, and object lock compliance testing.

2026-06-24

Why Self-Hosted Object Storage

Amazon S3 defined the object storage API that the entire data ecosystem has standardized around. Every data lake framework — Apache Spark, Iceberg, Delta Lake, DuckDB, Trino, MLflow — speaks S3. The problem is that S3's convenience comes with unpredictable egress costs, data residency constraints that prevent certain industries from using public cloud, and network latency for on-premises compute reading terabytes of Parquet files across a WAN link.

MinIO solves this by implementing the complete S3 API on hardware you control. Any application that works with S3 works with MinIO with a single endpoint change — no SDK changes, no driver updates, no schema modifications. You deploy MinIO on bare metal NVMe servers in your datacenter, on a Kubernetes cluster, or in a private cloud, and your data lake runs at local disk speed. When building a lakehouse on Delta Lake or Iceberg, MinIO gives you S3-compatible storage with sub-millisecond local latency and no per-request pricing — the economics change entirely at multi-petabyte scale.

Full S3 Compatibility

Implements S3 API v4 signatures, multipart upload, presigned URLs, bucket notifications, versioning, object lock, and lifecycle policies. Drop-in replacement for any S3-aware tool.

Erasure Coding

Built-in Reed-Solomon erasure coding tolerates drive and node failures without full data replication. A 16-drive pool can lose up to 8 drives before data becomes unavailable.

Tiering

Server-side tiering transparently moves cold objects from local NVMe to S3, GCS, or Azure Blob based on access frequency — keeping hot data fast and cold data cheap.

Single-Node Deployment

The fastest path to a working MinIO instance is the single-node deployment with multiple drives. MinIO's erasure coding requires at least four drives for data protection — with four drives, it stripes data across two data drives and two parity drives, surviving one drive failure. The MinIO binary is a single static executable with no runtime dependencies.

# Download and install the MinIO server binary
curl -O https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
mv minio /usr/local/bin/

# Create data directories on separate drives
mkdir -p /mnt/disk{1,2,3,4}/minio-data

# Create a dedicated system user
useradd --system --home /var/lib/minio --shell /sbin/nologin minio
chown -R minio:minio /mnt/disk{1,2,3,4}/minio-data

# Create the MinIO environment file
cat > /etc/default/minio <<'EOF'
# Root credentials — change before deploying
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=changeme_use_a_long_secret

# Bind address — 9000 for API, 9001 for console
MINIO_ADDRESS=:9000
MINIO_CONSOLE_ADDRESS=:9001

# Volume paths — MinIO uses erasure coding across these four drives
MINIO_VOLUMES="/mnt/disk1/minio-data /mnt/disk2/minio-data /mnt/disk3/minio-data /mnt/disk4/minio-data"
EOF
# systemd unit file: /etc/systemd/system/minio.service
[Unit]
Description=MinIO
Documentation=https://docs.min.io
Wants=network-online.target
After=network-online.target

[Service]
User=minio
Group=minio
EnvironmentFile=/etc/default/minio
ExecStart=/usr/local/bin/minio server $MINIO_VOLUMES   --address $MINIO_ADDRESS   --console-address $MINIO_CONSOLE_ADDRESS

Restart=always
LimitNOFILE=65536
TasksMax=infinity
TimeoutStopSec=infinity
SendSIGKILL=no

[Install]
WantedBy=multi-user.target
# Enable and start MinIO
systemctl daemon-reload
systemctl enable minio
systemctl start minio
systemctl status minio

# Verify the server is running
curl -s http://localhost:9000/minio/health/live
# → responds 200 OK when healthy

# Install the MinIO client (mc) for administration
curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin/

# Configure the mc alias
mc alias set local http://localhost:9000 minioadmin changeme_use_a_long_secret

# Verify connectivity and list buckets
mc admin info local
mc ls local/

Note

MinIO recommends XFS-formatted drivesfor best performance. EXT4 has higher metadata overhead for small object workloads. Each MinIO drive should map to a separate physical device — running all four "drives" on a single disk defeats erasure coding's fault tolerance goal.

Distributed Multi-Node Deployment

For production data lake deployments, a distributed MinIO cluster across multiple servers provides both high availability and increased storage capacity. MinIO uses a concept called server pools: each pool is a set of servers that form an erasure coding set. Objects are distributed across drives in the pool using consistent hashing, and each pool is independently fault-tolerant. Adding a new pool expands capacity without reshuffling existing data.

# /etc/default/minio on EACH of the 4 nodes (minio-{1,2,3,4}.internal)
# All nodes must have identical environment files

MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=changeme_use_a_long_secret

MINIO_ADDRESS=:9000
MINIO_CONSOLE_ADDRESS=:9001

# Distributed pool: 4 nodes × 4 drives each = 16 drives total
# MinIO selects erasure set size automatically (16 drives → 8+8 data/parity)
MINIO_VOLUMES="http://minio-{1...4}.internal:9000/mnt/disk{1...4}/minio-data"

# Site name for replication identification
MINIO_SITE_NAME=dc1-prod
# Start MinIO on all nodes simultaneously — the cluster will not form
# until a quorum of nodes are running. Use a parallel SSH tool:

for host in minio-{1..4}.internal; do
  ssh "$host" "systemctl start minio" &
done
wait

# Verify all nodes joined the cluster
mc admin info local

# Expected output shows all 4 nodes, 16 drives, erasure coding set size:
# ● minio-1.internal:9000
#   Uptime: 12s, Version: RELEASE.2026-06-01
# ● minio-2.internal:9000 ...
#
# Drives:     16 OK
# Erasure Set: 16 (8 data, 8 parity)
# Capacity:   28 TiB (14 TiB usable after 8+8 erasure)
# Nginx load balancer config for the MinIO cluster
# Place in front of all nodes to distribute API and console traffic

upstream minio_api {
  least_conn;
  server minio-1.internal:9000;
  server minio-2.internal:9000;
  server minio-3.internal:9000;
  server minio-4.internal:9000;
  keepalive 64;
}

upstream minio_console {
  ip_hash;  # sticky sessions for the web console
  server minio-1.internal:9001;
  server minio-2.internal:9001;
  server minio-3.internal:9001;
  server minio-4.internal:9001;
}

server {
  listen 443 ssl http2;
  server_name minio.internal;

  ssl_certificate     /etc/ssl/certs/minio.crt;
  ssl_certificate_key /etc/ssl/private/minio.key;

  # API endpoint
  location / {
    proxy_pass         http://minio_api;
    proxy_set_header   Host $host;
    proxy_set_header   X-Real-IP $remote_addr;
    proxy_buffering    off;
    client_max_body_size 0;        # allow large object uploads
    proxy_request_buffering off;   # stream uploads directly
  }
}

server {
  listen 9443 ssl http2;
  server_name minio.internal;

  ssl_certificate     /etc/ssl/certs/minio.crt;
  ssl_certificate_key /etc/ssl/private/minio.key;

  # Console endpoint
  location / {
    proxy_pass       http://minio_console;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
  }
}

TLS and IAM Policies

MinIO supports TLS natively: drop a certificate and key in ~/.minio/certs/ and the server automatically switches to HTTPS. For internal deployments, generate a wildcard certificate covering the cluster hostnames. Access control uses IAM policies identical in syntax to AWS IAM — JSON policy documents with Effect, Action, and Resource blocks. Service accounts (access key + secret key pairs) are attached to users or groups with scoped policies.

# Generate a self-signed wildcard cert for internal use
openssl req -x509 -nodes -newkey rsa:4096   -keyout /etc/ssl/private/minio.key   -out /etc/ssl/certs/minio.crt   -days 3650   -subj "/CN=*.internal"   -addext "subjectAltName=DNS:minio.internal,DNS:*.minio.internal"

# Copy certs to MinIO's expected location on each node
mkdir -p /home/minio/.minio/certs
cp /etc/ssl/certs/minio.crt /home/minio/.minio/certs/public.crt
cp /etc/ssl/private/minio.key /home/minio/.minio/certs/private.key
chown -R minio:minio /home/minio/.minio
# Create IAM policies for different access levels

# 1. Read-only policy for analysts — can only read from the data-lake bucket
mc admin policy create local readonly-datalake - <<'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::data-lake",
        "arn:aws:s3:::data-lake/*"
      ]
    }
  ]
}
EOF

# 2. Writer policy for pipeline service accounts — put/delete in specific prefix
mc admin policy create local pipeline-writer - <<'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Resource": [
        "arn:aws:s3:::data-lake/raw/*",
        "arn:aws:s3:::data-lake/staging/*",
        "arn:aws:s3:::data-lake"
      ]
    }
  ]
}
EOF

# 3. Create a service account for Spark jobs and attach the writer policy
mc admin user add local spark-service a_long_random_secret_key
mc admin policy attach local pipeline-writer --user spark-service

# 4. Create a service account for analysts
mc admin user add local analyst-reader another_secret_key
mc admin policy attach local readonly-datalake --user analyst-reader

Bucket Configuration and Lifecycle Policies

Lifecycle policies automate the movement or deletion of objects based on age or prefix. In a data lake context, lifecycle rules typically expire raw ingest data after a retention window while keeping processed Parquet files indefinitely, or transition objects to a tiered storage class after they stop being queried frequently.

# Create buckets with versioning enabled
mc mb local/data-lake
mc mb local/ml-artifacts
mc mb local/logs-archive

# Enable versioning (required for object lock and cross-site replication)
mc version enable local/data-lake
mc version enable local/ml-artifacts

# Set a lifecycle policy on the logs-archive bucket:
# - Expire objects in raw/ prefix after 90 days
# - Delete non-current (old) versions after 30 days

mc ilm rule add local/logs-archive   --prefix "raw/"   --expire-days 90

mc ilm rule add local/logs-archive   --prefix "raw/"   --noncurrent-expire-days 30

# View the lifecycle configuration
mc ilm rule ls local/logs-archive
# Set bucket policy for public read of specific prefix (e.g., public datasets)
mc anonymous set download local/data-lake/public

# Configure bucket notification to Kafka on object creation
# MinIO can publish events to Kafka, NATS, Redis, Webhook, and more
mc event add local/data-lake   arn:minio:sqs::kafka-prod:kafka   --event "s3:ObjectCreated:*"   --prefix "raw/events/"

# Configure the Kafka notification target in MinIO
mc admin config set local notify_kafka:kafka-prod   brokers="kafka-1.internal:9092,kafka-2.internal:9092"   topic="minio-events"   tls="off"   queue_limit="10000"

mc admin service restart local

Note

Object lock (WORM — Write Once Read Many) requires enabling it at bucket creation time — it cannot be added to an existing bucket. Use mc mb --with-lock local/compliance-archive then configure retention rules. Object lock is typically used for regulatory compliance requirements such as SEC 17a-4, HIPAA, and CFTC Rule 1.31.

Server-Side Tiering to Public Cloud

MinIO's tiering feature moves objects that have not been accessed recently from local NVMe storage to a remote tier (S3, GCS, or Azure Blob) transparently. The object's S3 API path does not change — clients still read it from the same bucket and key. MinIO fetches the object from the remote tier on demand, caching it locally for subsequent accesses. This is not a copy; the local copy is deleted after a configurable warm-up period, so you only pay for remote storage on cold data.

# Create a remote tier pointing at AWS S3 for cold storage
mc ilm tier add s3 local COLD-S3-TIER   --bucket minio-cold-tier   --prefix minio-data/   --access-key AKIAIOSFODNN7EXAMPLE   --secret-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY   --region us-east-1

# Verify the tier was created
mc ilm tier ls local

# Apply a transition rule: objects in data-lake/raw/ older than 30 days
# move to the COLD-S3-TIER remote tier
mc ilm rule add local/data-lake   --prefix "raw/"   --transition-days 30   --transition-tier COLD-S3-TIER

# Apply another rule: processed/ prefix transitions after 180 days
mc ilm rule add local/data-lake   --prefix "processed/"   --transition-days 180   --transition-tier COLD-S3-TIER

# List all lifecycle rules
mc ilm rule ls local/data-lake
# Check tiering statistics — how many objects have been transitioned
mc admin tier stats local COLD-S3-TIER

# Example output:
# ┌─────────────────────┬──────────────┬───────────────┬───────────────┐
# │ Tier Name           │ Tier Type    │ Objects       │ Size          │
# ├─────────────────────┼──────────────┼───────────────┼───────────────┤
# │ COLD-S3-TIER        │ S3           │ 2,847,392     │ 18.3 TiB      │
# └─────────────────────┴──────────────┴───────────────┴───────────────┘

# Restore a specific object from cold tier back to hot storage
mc restore local/data-lake/raw/events/2025-01-15/batch_001.parquet --days 7

Kubernetes Deployment with the MinIO Operator

The MinIO Kubernetes Operator manages MinIO tenants as custom resources. Each Tenant CRD defines a complete MinIO cluster: server pools, storage classes, TLS certificates, and console configuration. The operator handles pod scheduling, certificate rotation, and rolling upgrades. When running MinIO on Kubernetes, right-sizing storage requests with the correct PVC storage class — NVMe-backed for hot data, HDD-backed for cold — directly drives both performance and cost since MinIO does its own erasure coding rather than relying on Kubernetes volume replication.

# Install the MinIO Operator via Helm
helm repo add minio-operator https://operator.min.io
helm repo update

helm install minio-operator minio-operator/operator   --namespace minio-operator   --create-namespace   --set operator.replicaCount=2

# Verify operator pods are running
kubectl get pods -n minio-operator
# minio-tenant.yaml — a production MinIO tenant with 4 servers × 4 PVCs
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
  name: data-lake
  namespace: minio
spec:
  image: minio/minio:RELEASE.2026-06-01T01-46-58Z

  # Credentials stored in Kubernetes Secret
  configuration:
    name: minio-env-config

  # Pool definition — 4 server pods, 4 PVCs each = 16 volumes
  pools:
    - name: pool-0
      servers: 4
      volumesPerServer: 4
      volumeClaimTemplate:
        metadata:
          name: data
        spec:
          storageClassName: nvme-local     # NVMe-backed StorageClass
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 2Ti               # 2 TiB per drive

      resources:
        requests:
          cpu: "4"
          memory: 16Gi
        limits:
          cpu: "8"
          memory: 32Gi

      # Spread pods across nodes to maximize fault tolerance
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              v1.min.io/tenant: data-lake

  # TLS configuration using cert-manager
  requestAutoCert: false
  externalCertSecret:
    - name: minio-tls
      type: kubernetes.io/tls

  # Expose the API via a LoadBalancer Service
  exposeServices:
    minio: true
    console: true

  # Console access
  console:
    replicas: 2
    resources:
      requests:
        cpu: 250m
        memory: 512Mi
# Secret with MinIO root credentials
apiVersion: v1
kind: Secret
metadata:
  name: minio-env-config
  namespace: minio
type: Opaque
stringData:
  config.env: |
    export MINIO_ROOT_USER="minioadmin"
    export MINIO_ROOT_PASSWORD="changeme_use_a_long_secret"
    export MINIO_SITE_NAME="k8s-prod"
    export MINIO_STORAGE_CLASS_STANDARD="EC:4"
    export MINIO_PROMETHEUS_AUTH_TYPE="public"

---
# Apply the tenant manifest
kubectl create namespace minio
kubectl apply -f minio-tenant.yaml

# Watch tenant pods come up
kubectl get pods -n minio -w

# Get the service endpoint for the API
kubectl get svc -n minio

Python and Spark Integration

Because MinIO implements the S3 API exactly, boto3 and the AWS SDK work without modification — just point endpoint_url at your MinIO cluster instead of the AWS regional endpoint. The same applies to Apache Spark's Hadoop S3A connector: setting fs.s3a.endpoint makes Spark read Parquet from MinIO at full local network speed. Apache Iceberg tables stored on MinIO use the same REST or Hive catalog configuration as on S3, making MinIO a transparent on-premises lakehouse storage layer for time travel, schema evolution, and partition pruning at local disk performance.

# Python boto3 — connecting to MinIO as if it were S3
import boto3
from botocore.config import Config

s3 = boto3.client(
    "s3",
    endpoint_url="https://minio.internal:9000",
    aws_access_key_id="spark-service",
    aws_secret_access_key="a_long_random_secret_key",
    config=Config(
        signature_version="s3v4",
        retries={"max_attempts": 3, "mode": "standard"},
    ),
    verify="/etc/ssl/certs/minio.crt",  # custom CA if using self-signed cert
)

# List objects in a bucket
response = s3.list_objects_v2(Bucket="data-lake", Prefix="raw/events/")
for obj in response.get("Contents", []):
    print(f"{obj['Key']}  {obj['Size'] / 1e6:.1f} MB")

# Multipart upload for large files — streams without loading into memory
from boto3.s3.transfer import TransferConfig

config = TransferConfig(
    multipart_threshold=100 * 1024 * 1024,   # 100 MB threshold
    multipart_chunksize=50 * 1024 * 1024,    # 50 MB parts
    max_concurrency=10,
    use_threads=True,
)
s3.upload_file(
    "/data/large_dataset.parquet",
    "data-lake",
    "processed/large_dataset.parquet",
    Config=config,
)
# Apache Spark configuration for MinIO as S3A backend
# Add to SparkSession or spark-defaults.conf

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.appName("minio-example")
    .config("spark.hadoop.fs.s3a.endpoint", "https://minio.internal:9000")
    .config("spark.hadoop.fs.s3a.access.key", "spark-service")
    .config("spark.hadoop.fs.s3a.secret.key", "a_long_random_secret_key")
    .config("spark.hadoop.fs.s3a.path.style.access", "true")      # required for MinIO
    .config("spark.hadoop.fs.s3a.connection.ssl.enabled", "true")
    .config("spark.hadoop.fs.s3a.aws.credentials.provider",
            "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")
    # Performance tuning for large Parquet reads
    .config("spark.hadoop.fs.s3a.multipart.size", "67108864")     # 64 MB parts
    .config("spark.hadoop.fs.s3a.threads.max", "64")
    .config("spark.hadoop.fs.s3a.connection.maximum", "100")
    .config("spark.hadoop.fs.s3a.fast.upload", "true")
    .config("spark.hadoop.fs.s3a.block.size", "134217728")        # 128 MB block size
    .getOrCreate()
)

# Read Parquet from MinIO — identical to reading from s3://
df = spark.read.parquet("s3a://data-lake/processed/orders/")
df.createOrReplaceTempView("orders")

result = spark.sql("""
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(revenue_usd)                AS monthly_revenue,
        COUNT(*)                        AS order_count
    FROM orders
    WHERE order_date >= '2025-01-01'
    GROUP BY 1
    ORDER BY 1
""")
result.write.mode("overwrite").parquet("s3a://data-lake/reports/monthly_revenue/")

Monitoring with Prometheus and Grafana

MinIO exposes a Prometheus-compatible metrics endpoint at /minio/v2/metrics/cluster. The metrics cover disk capacity, object counts, API request rates (GET, PUT, DELETE by bucket), network throughput, erasure coding health, and replication lag. MinIO also ships a pre-built Grafana dashboard (ID 13502) that visualizes the most important cluster health indicators.

# prometheus.yml — scrape configuration for MinIO
scrape_configs:
  - job_name: "minio"
    metrics_path: "/minio/v2/metrics/cluster"
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/minio-ca.crt
    static_configs:
      - targets:
          - "minio.internal:9000"
    # If MINIO_PROMETHEUS_AUTH_TYPE is set to 'jwt', generate a bearer token:
    # authorization:
    #   credentials_file: /etc/prometheus/minio-bearer-token

# Key metrics to alert on:
# minio_cluster_capacity_usable_free_bytes  — alert when < 20% free
# minio_cluster_drive_offline_total         — alert when > 0
# minio_s3_requests_errors_total            — alert on sustained error rate
# minio_cluster_health_status               — alert on non-1 values
# PrometheusRule for MinIO cluster health alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: minio-alerts
  namespace: monitoring
spec:
  groups:
    - name: minio
      rules:
        - alert: MinIODrivesOffline
          expr: minio_cluster_drive_offline_total > 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "MinIO drives offline"
            description: "{{ $value }} drive(s) offline on {{ $labels.server }}"

        - alert: MinIOLowDiskSpace
          expr: |
            (minio_cluster_capacity_usable_free_bytes /
             minio_cluster_capacity_usable_total_bytes) < 0.20
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "MinIO disk space below 20%"
            description: "Usable free space is {{ $value | humanizePercentage }}"

        - alert: MinIOHighErrorRate
          expr: |
            rate(minio_s3_requests_errors_total[5m]) > 10
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "MinIO S3 API error rate elevated"
            description: "{{ $value | humanize }} errors/sec on bucket {{ $labels.bucket }}"

        - alert: MinIOReplicationLag
          expr: minio_replication_pending_bytes > 5368709120  # 5 GB
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "MinIO replication lag exceeding 5 GB"
            description: "{{ $value | humanize1024 }}B pending replication"

Production Checklist

1

Use dedicated drives with XFS formatting for each MinIO volume. EXT4 introduces higher metadata overhead for small object workloads. Each logical volume in MINIO_VOLUMES must map to a separate physical device or partition — sharing a drive across multiple volume paths defeats the purpose of erasure coding and degrades sequential throughput.

2

Enable erasure coding with at least an EC:4 (4+4) configuration in production. The default storage class EC:4 gives you half usable capacity but tolerates simultaneous loss of half the drives in an erasure set. Use 'mc admin config set local storage_class standard=EC:4' and verify with 'mc admin info local' showing 'Standard class' parity.

3

Rotate root credentials immediately after initial setup and switch to service accounts with scoped IAM policies. The root account should only be used for IAM administration — never in application code. Create separate service accounts per pipeline, each with the minimum permissions required. Store credentials in Kubernetes Secrets or HashiCorp Vault, never in config files.

4

Enable TLS on all MinIO endpoints before exposing to any network. S3 clients send access keys in every request header; transmitting them over plaintext HTTP leaks credentials to any network observer. Use a proper CA-signed certificate in production, or cert-manager with Let's Encrypt for Kubernetes deployments.

5

Configure bucket versioning before enabling lifecycle tiering rules. Tiering requires versioning to be enabled, and versioning provides a safety net against accidental deletes. Set noncurrent version expiry to prevent unbounded version accumulation on frequently-overwritten objects.

6

Tune the S3A connector's multipart and threading settings when using Spark on MinIO. The default Hadoop S3A settings assume AWS S3 latency profiles. On a local network, increase fs.s3a.threads.max to 64 or more, set fs.s3a.multipart.size to 64 MB, and enable fs.s3a.fast.upload to prevent driver OOM on large writes.

7

Deploy MinIO pods with topologySpreadConstraints across Kubernetes nodes. If all MinIO pods land on the same node (which Kubernetes may do by default on a small cluster), a single node failure takes down the entire cluster rather than just reducing capacity. Set maxSkew=1 on kubernetes.io/hostname to guarantee physical distribution.

8

Set up cross-site replication for disaster recovery. MinIO supports active-active bucket replication between two independent clusters. Configure replication from the primary datacenter cluster to a DR cluster with 'mc replicate add' and test failover quarterly by pointing applications at the DR endpoint and verifying read/write capability.

9

Monitor the minio_cluster_capacity_usable_free_bytes metric and alert when free capacity drops below 20%. MinIO performance degrades sharply when drives fill beyond 80% because erasure coding requires free space to write parity blocks. Plan capacity additions before hitting the 80% threshold — adding a new server pool requires rebalancing.

10

Use object lock for any compliance archive buckets and test WORM enforcement regularly. Object lock prevents deletion or overwrite for the configured retention period. Verify that even root credentials cannot delete locked objects (they should not be able to — object lock enforces at the storage layer). Document retention periods per bucket and attach them to your data governance catalog.

Running Spark or Iceberg workloads against AWS S3 with expensive egress bills, hitting data residency requirements that prevent using public cloud storage, or needing S3-compatible on-premises storage for your data lake without vendor lock-in?

We design and deploy MinIO production clusters — from single-node erasure-coded deployments on bare metal NVMe to distributed multi-node server pools behind Nginx load balancers, MinIO Kubernetes Operator Tenant CRD configuration with NVMe StorageClass PVCs and topology spread constraints, IAM policy design for service accounts per pipeline with least-privilege S3 action scoping, TLS setup with cert-manager or custom CA certificates, lifecycle tiering configuration to S3 or GCS for cold data cost reduction, bucket notification wiring to Kafka for event-driven pipeline triggers, Prometheus and Grafana alerting for cluster health and capacity, boto3 and Spark S3A connector performance tuning for local network throughput, cross-site active-active replication for disaster recovery, and object lock configuration for compliance archive buckets. Let’s talk.

Let's Talk

Related Articles

DataSOps Consulting

Need help implementing this in production?

We build and operate data pipelines, AI systems, and observability stacks for engineering teams. Reach out for a free 30-minute architecture review.