What industries do you work with?

We work across a wide range of industries including finance, healthcare, e-commerce, logistics, and telecommunications. Our solutions are tailored to each client’s specific domain requirements and regulatory environment.

How long does a typical engagement take?

It depends on the scope. A focused observability deployment or automation workflow can be delivered in 4-6 weeks. Larger initiatives like full-scale LLM integration or platform builds typically run 2-4 months. We always start with a discovery phase to align on timelines.

Do you offer ongoing support after project delivery?

Yes. We offer flexible support and maintenance plans to ensure your systems stay healthy, updated, and optimized. We can also embed with your team on a part-time basis for continuous improvement.

Can you work with our existing tech stack?

Absolutely. We integrate with your current infrastructure and tools rather than forcing a rip-and-replace. Whether you’re on AWS, GCP, Azure, or on-prem, we adapt our approach to what works best for your environment.

What is your pricing model?

We offer both fixed-price project engagements and time-and-materials contracts depending on the nature of the work. Reach out through our contact form and we’ll provide a tailored estimate within 24 hours.

How do you handle data security and compliance?

Security is built into every engagement. We follow industry best practices for data handling, support GDPR and SOC 2 compliance requirements, and can work within your existing security policies and access controls.

Elastic Stack Complete Guide 2026 — Elasticsearch, Logstash, Kibana & Beats

The Elastic Stack — formerly the ELK Stack — is the most widely deployed open-source observability and search platform in the world. It powers log analytics at Netflix, real-time search at GitHub, security monitoring at government agencies, and APM pipelines at thousands of engineering teams. This guide covers every layer of the stack from first principles to production hardening, with practical examples you can run today.

Note

This is a pillar guide. It covers the full Elastic Stack breadth. For deep dives into specific areas, see our companion articles: Elasticsearch Read Optimization for query-level tuning, and OpenTelemetry in Practice for integrating OTel data into Elasticsearch. If you are considering migrating to OpenSearch, see our step-by-step OpenSearch migration playbook.

What Is the Elastic Stack?

The Elastic Stack is a collection of four open-source projects maintained by Elastic, designed to work together as a unified pipeline for ingesting, storing, searching, and visualizing data at any scale:

Elasticsearch — the distributed search and analytics engine at the core. Stores data as JSON documents, indexes them with an inverted index, and exposes a REST API for full-text search, structured queries, and aggregations.
Logstash — a server-side data processing pipeline. Ingests data from multiple sources, transforms it through filter plugins (grok, mutate, date, GeoIP), and outputs it to Elasticsearch or other destinations.
Kibana — the visualization and exploration UI. Provides Discover for ad-hoc log exploration, Dashboards for operational monitoring, Lens for drag-and-drop charts, and Alerting for threshold-based notifications.
Beats — lightweight data shippers written in Go. Filebeat tails log files, Metricbeat collects system and service metrics, Packetbeat captures network traffic, and Heartbeat monitors uptime. All ship directly to Elasticsearch or Logstash.

Together they form a complete observability data platform — but each component can also be used independently or replaced by alternatives like Fluentd (instead of Logstash) or Grafana (instead of Kibana).

Elasticsearch Architecture Deep Dive

Nodes and Roles

An Elasticsearch cluster is a group of nodes that collectively hold your data and provide indexing and search capabilities. Each node has one or more roles that determine what work it performs:

master — manages cluster state (index creation, node membership, shard allocation). Deploy 3 dedicated master-eligible nodes for production HA.
data — holds shard data and handles CRUD, search, and aggregations. Subdivide into data_hot, data_warm, data_cold for ILM tiering.
ingest — runs ingest pipelines (pre-indexing transformations). Use dedicated ingest nodes if you have heavy enrichment workloads.
coordinating — routes requests and merges results. Every node can coordinate, but dedicated coordinating nodes reduce load on data nodes in large clusters.
ml — runs machine learning jobs for anomaly detection and data frame analytics.

# elasticsearch.yml — dedicated hot data node
node.roles: [ data_hot ]
node.name: hot-01
cluster.name: prod-cluster

# JVM heap: half of available RAM, max 31GB
# -Xms16g -Xmx16g in jvm.options

Indices, Shards, and Replicas

An index is a logical namespace for a collection of documents. Elasticsearch divides each index into shards — independent Lucene instances that can be distributed across nodes. Shards are the unit of parallelism: queries fan out to all shards in parallel, and results are merged.

Each shard has a configurable number of replicas— exact copies that serve read requests and provide fault tolerance. If a primary shard's node fails, a replica is promoted to primary automatically.

Warning

The number of primary shards is fixed at index creation time. Over-sharding is the most common Elasticsearch anti-pattern — it degrades performance and inflates cluster state. Target 20–50 GB per shard for search workloads, up to 50–200 GB for logging where search is less intensive. Use force-merge and ILM rollover to control shard sizes over time.

# Create an index with explicit shard settings
PUT /my-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "index.routing.allocation.require._tier_preference": "data_hot"
  },
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "service":   { "type": "keyword" },
      "message":   { "type": "text" },
      "level":     { "type": "keyword" },
      "duration_ms": { "type": "integer" }
    }
  }
}

How Indexing Works

When a document is indexed, Elasticsearch routes it to a primary shard using a deterministic hash of the document ID (or a custom routing value). The primary shard writes the document to its in-memory buffer and the transaction log (translog). A refresh (every 1 second by default) flushes the buffer to a new Lucene segment, making the document visible to searches. A flush writes the segment to disk and clears the translog. Background merges compact multiple small segments into fewer large ones, improving query performance.

Tip

For bulk indexing, increase refresh_interval to 30s or -1 (disable) and restore after loading. This eliminates refresh overhead and can increase throughput by 10×. Also set number_of_replicas: 0 during bulk load, then re-enable replicas after.

Query DSL — Searching and Filtering

Elasticsearch's Query DSL is a JSON-based language for expressing searches. The fundamental distinction is between query context (affects relevance scoring) and filter context (binary yes/no, cached). Filters are faster and should be used for exact-match conditions.

Bool Query — The Foundation

The bool query is the primary building block. It combines clauses with four operators:

GET /logs-*/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "message": "payment failed" } }
      ],
      "filter": [
        { "term":  { "level": "ERROR" } },
        { "range": { "timestamp": { "gte": "now-1h" } } },
        { "terms": { "service": ["checkout", "billing"] } }
      ],
      "must_not": [
        { "term": { "env": "staging" } }
      ]
    }
  },
  "sort": [{ "timestamp": "desc" }],
  "size": 100
}

must clauses score and are required. filter clauses don't score and are cached — use them for all structured conditions. should boosts relevance but doesn't require a match (unless it's the only clause). must_not excludes matching documents.

Aggregations

Aggregations provide analytics over your data — equivalent to SQL GROUP BY, COUNT, and AVG, but executing in parallel across shards. Bucket aggregations group documents; metric aggregations compute values over a group.

GET /logs-*/_search
{
  "size": 0,
  "query": {
    "range": { "timestamp": { "gte": "now-24h" } }
  },
  "aggs": {
    "errors_per_service": {
      "terms": {
        "field": "service",
        "size": 20,
        "order": { "_count": "desc" }
      },
      "aggs": {
        "error_rate_over_time": {
          "date_histogram": {
            "field": "timestamp",
            "fixed_interval": "1h"
          }
        },
        "avg_duration": {
          "avg": { "field": "duration_ms" }
        },
        "p99_duration": {
          "percentiles": {
            "field": "duration_ms",
            "percents": [50, 90, 95, 99]
          }
        }
      }
    }
  }
}

This single query returns, in one round trip: the top 20 services by error count, plus per-service hourly error trends, average response times, and p50/p90/p95/p99 latency percentiles — all filtered to the last 24 hours.

Index Lifecycle Management (ILM)

ILM automates the movement of indices through a lifecycle — hot → warm → cold → frozen → delete — based on age, size, or document count thresholds. This is the cornerstone of cost-efficient log storage: you keep recent data on fast SSD nodes, and older data on cheaper warm/cold nodes or object storage. For a complete deep-dive into ILM policies — including tiered node configuration, searchable snapshots, and programmatic policy management — see our dedicated guide.

Defining an ILM Policy

PUT _ilm/policy/logs-policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d"
          },
          "set_priority": { "priority": 100 }
        }
      },
      "warm": {
        "min_age": "3d",
        "actions": {
          "shrink":     { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 },
          "set_priority": { "priority": 50 },
          "allocate": {
            "require": { "_tier_preference": "data_warm" }
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "s3-repo"
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": { "delete": {} }
      }
    }
  }
}

Index Templates and Data Streams

Rather than managing individual indices, use data streams — an abstraction over a sequence of time-series indices (called backing indices) with automatic rollover. An index template applies settings, mappings, and the ILM policy to all new backing indices automatically.

# 1. Create an index template
PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "index.lifecycle.name": "logs-policy",
      "number_of_shards": 2,
      "number_of_replicas": 1
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "service":    { "type": "keyword" },
        "level":      { "type": "keyword" },
        "message":    { "type": "text" }
      }
    }
  }
}

# 2. Create the data stream (backing index auto-created)
PUT _data_stream/logs-app

Tip

Data streams enforce that every document has a @timestamp field and that all writes go to the current write index. Rollover happens automatically when the hot policy condition is met. This eliminates the need to manage index aliases manually.

Logstash — Data Pipeline Engine

Logstash is a stateful ETL pipeline with a plugin-based architecture. A pipeline has three stages: input → filter → output. Multiple pipelines can run concurrently, each isolated in their own thread pool, enabling you to separate high-volume log ingestion from lower-volume metric pipelines.

A Production Pipeline

# /etc/logstash/conf.d/app-logs.conf

input {
  beats {
    port => 5044
    ssl  => true
    ssl_certificate_authorities => ["/etc/logstash/certs/ca.crt"]
  }
  kafka {
    bootstrap_servers => "kafka-01:9092,kafka-02:9092"
    topics            => ["app.logs.production"]
    codec             => "json"
    consumer_threads  => 4
  }
}

filter {
  # Parse structured fields from message
  if [message] =~ /^{/ {
    json { source => "message" }
  } else {
    grok {
      match => {
        "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[%{DATA:service}\] %{GREEDYDATA:body}"
      }
    }
  }

  # Normalize timestamp
  date {
    match    => ["timestamp", "ISO8601"]
    target   => "@timestamp"
    timezone => "UTC"
  }

  # Enrich with GeoIP if IP present
  if [client_ip] {
    geoip { source => "client_ip" target => "geoip" }
  }

  # Drop health-check noise
  if [path] =~ "/health" or [level] == "DEBUG" {
    drop {}
  }

  mutate {
    remove_field => ["timestamp", "host", "agent"]
  }
}

output {
  elasticsearch {
    hosts    => ["https://es-01:9200", "https://es-02:9200"]
    data_stream        => true
    data_stream_type   => "logs"
    data_stream_dataset => "app"
    data_stream_namespace => "production"
    ssl_certificate_authorities => ["/etc/logstash/certs/ca.crt"]
    api_key => "${ES_API_KEY}"
    ilm_rollover_alias => "logs-app"
  }
}

Persistent Queues and Dead Letter Queues

By default, Logstash uses in-memory queues — events are lost if Logstash crashes between input and output. Enable Persistent Queues (PQ) to buffer events to disk, surviving restarts without data loss:

# logstash.yml
queue.type: persisted
queue.max_bytes: 4gb
queue.checkpoint.writes: 1024

# Dead Letter Queue — capture failed events for inspection
dead_letter_queue.enable: true
dead_letter_queue.max_bytes: 1gb

Events that fail processing (e.g., invalid JSON, mapping conflicts) land in the Dead Letter Queue. You can replay them with the dead_letter_queue input plugin after fixing the pipeline.

Beats — Lightweight Data Shippers

Beats are single-purpose agents written in Go, designed for minimal footprint (typically under 50 MB RAM). They ship data directly to Elasticsearch or via Logstash for additional processing. The most commonly deployed:

Filebeat — Log Collection

Filebeat reads log files, tracks position (so it survives restarts), and ships events. It includes hundreds of pre-built modules for common log formats — NGINX, MySQL, Kubernetes, AWS, etc. — that auto-configure parsing and Kibana dashboards.

# filebeat.yml
filebeat.inputs:
  - type: filestream
    id: app-logs
    paths:
      - /var/log/app/*.log
      - /var/log/app/**/*.log
    parsers:
      - multiline:
          type: pattern
          pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
          negate: true
          match: after
    fields:
      service: my-app
      env: production
    fields_under_root: true

# Use modules for common services
filebeat.modules:
  - module: nginx
    access:
      enabled: true
      var.paths: ["/var/log/nginx/access.log"]
    error:
      enabled: true

output.logstash:
  hosts: ["logstash-01:5044"]
  ssl.certificate_authorities: ["/etc/filebeat/ca.crt"]

processors:
  - add_host_metadata: ~
  - add_kubernetes_metadata: ~

Metricbeat — System and Service Metrics

Metricbeat collects system metrics (CPU, memory, disk, network) and service-specific metrics from MySQL, Redis, Elasticsearch itself, Kubernetes, Docker, and many others. It's the standard way to feed the Elasticsearch monitoring cluster.

# metricbeat.yml — monitor Elasticsearch itself
metricbeat.modules:
  - module: elasticsearch
    xpack.enabled: true
    period: 10s
    hosts: ["https://es-01:9200", "https://es-02:9200"]
    ssl.certificate_authorities: ["/etc/metricbeat/ca.crt"]
    api_key: "${ES_MONITORING_API_KEY}"

  - module: system
    period: 30s
    metricsets:
      - cpu
      - memory
      - filesystem
      - network
      - process
    processes: ['.*']

output.elasticsearch:
  hosts: ["https://monitoring-cluster:9200"]

Kibana — Visualization and Operations

Kibana is much more than a dashboard tool. For day-to-day operations, the most important features are Discover, Dashboards, Alerting, and the Dev Tools Console.

Discover — Ad-Hoc Exploration

Discover is your primary interface for log investigation. It provides a time-series histogram, a document table, and a KQL (Kibana Query Language) search bar. KQL is a simplified syntax on top of the Query DSL:

# KQL examples in Kibana Discover

# Find all errors from the checkout service in the last hour
level: ERROR and service: checkout

# Find 5xx responses with slow response times
http.response.status_code >= 500 and duration_ms > 5000

# Wildcard search in message
message: *connection refused*

# Phrase match
message: "null pointer exception"

# Range on numeric field
duration_ms >= 1000 and duration_ms <= 5000

Alerting Rules

Kibana Alerting (formerly Watcher in the commercial tier, now available in Basic) lets you define threshold-based rules that run on a schedule and trigger actions (Slack, PagerDuty, email, webhook). An Elasticsearch Query rule is the most flexible type:

# Via Kibana UI → Stack Management → Rules
# Rule: alert when error rate spikes

Rule type: Elasticsearch query
Index: logs-app
Query:
{
  "bool": {
    "filter": [
      { "term": { "level": "ERROR" } },
      { "range": { "@timestamp": { "gte": "now-5m" } } }
    ]
  }
}
Threshold: count > 50 in the last 5 minutes
Action: Slack webhook → #alerts-prod

Security — TLS, Authentication, and RBAC

Since Elasticsearch 8.0, security is enabled by default. Attempting to run a cluster without TLS now results in a startup failure. The key security layers are:

Transport and HTTP TLS

# elasticsearch.yml — minimal TLS config
xpack.security.enabled: true
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: /etc/elasticsearch/certs/elastic-certificates.p12

xpack.security.http.ssl:
  enabled: true
  keystore.path: /etc/elasticsearch/certs/http.p12

Generate certificates with the bundled elasticsearch-certutiltool. For production, use a proper CA (Let's Encrypt for HTTP, internal CA for transport).

RBAC — Roles and Users

Elasticsearch uses role-based access control. A role defines which indices a user can access and what operations are permitted. Define a read-only role for Kibana dashboard viewers:

PUT _security/role/logs-viewer
{
  "cluster": ["monitor"],
  "indices": [
    {
      "names": ["logs-*"],
      "privileges": ["read", "view_index_metadata"],
      "query": {
        "term": { "env": "production" }
      }
    }
  ],
  "applications": [
    {
      "application": "kibana-.kibana",
      "privileges": ["feature_discover.read", "feature_dashboard.read"],
      "resources": ["space:default"]
    }
  ]
}

PUT _security/user/alice
{
  "password": "...",
  "roles": ["logs-viewer"],
  "full_name": "Alice Smith"
}

Tip

Use API keys (POST _security/api_key) for service accounts instead of username/password. API keys are scoped, revocable, and don't expose credentials in config files. Store them in your secrets manager and reference via environment variables in Logstash/Beats configs.

Ingest Pipelines — Pre-Indexing Transformation

Ingest pipelines run on Elasticsearch's ingest nodes before a document is written to an index. They are lighter-weight than Logstash but support most common transformations. Use them when you want to enrich data without a separate Logstash deployment.

PUT _ingest/pipeline/parse-app-logs
{
  "description": "Parse and enrich application logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:log.level} \[%{DATA:service.name}\] %{GREEDYDATA:log.original}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": ["ISO8601"],
        "target_field": "@timestamp"
      }
    },
    {
      "geoip": {
        "field": "client.ip",
        "target_field": "client.geo",
        "ignore_missing": true
      }
    },
    {
      "set": {
        "field": "data_stream.dataset",
        "value": "app"
      }
    },
    {
      "remove": {
        "field": ["timestamp", "message"],
        "ignore_missing": true
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}

Production Best Practices

Cluster Sizing Rules of Thumb

JVM heap: no more than 50% of RAM, capped at 31 GB (beyond this, compressed ordinary object pointers are disabled and performance degrades sharply).
Data nodes: size for 1:30 heap-to-disk ratio for hot data (e.g., 16 GB heap → 480 GB usable storage per node).
Master nodes: always 3 dedicated masters — never collocate master + data on large clusters to avoid GC pauses affecting cluster stability.
Shard count: keep total shard count per node below 20 shards per GB of heap (e.g., 16 GB heap → max 320 shards).

Operating System Configuration

# /etc/sysctl.d/elasticsearch.conf
vm.max_map_count = 262144          # Required — Lucene memory-mapped files
vm.swappiness = 1                  # Minimize swap; set to 0 or disable entirely
net.core.somaxconn = 65535

# /etc/security/limits.conf for elasticsearch user
elasticsearch soft nofile 65536
elasticsearch hard nofile 65536
elasticsearch soft nproc  4096
elasticsearch hard nproc  4096
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

Snapshot and Restore

Register an S3 (or GCS/Azure) snapshot repository and automate daily snapshots with Snapshot Lifecycle Management (SLM). This is your last line of defense against data loss and the cheapest way to implement cold storage.

# Register S3 repository
PUT _snapshot/s3-repo
{
  "type": "s3",
  "settings": {
    "bucket":    "my-es-snapshots",
    "region":    "eu-central-1",
    "server_side_encryption": true
  }
}

# SLM policy — daily snapshot, retain 30 days
PUT _slm/policy/daily-snapshots
{
  "schedule":   "0 30 1 * * ?",
  "name":       "<daily-snap-{now/d}>",
  "repository": "s3-repo",
  "config": {
    "include_global_state": false,
    "indices": ["logs-*", "metrics-*"]
  },
  "retention": {
    "expire_after": "30d",
    "min_count": 7,
    "max_count": 30
  }
}

Monitoring the Cluster

Run a dedicated monitoring cluster — never ship metrics to the same cluster you're monitoring. If that cluster goes down, you lose visibility exactly when you need it most. Use Metricbeat with the elasticsearch module pointing to your production cluster, outputting to the monitoring cluster.

Key metrics to alert on:

Cluster status: alert on anything other than green. Yellow = unassigned replicas. Red = unassigned primaries (data unavailable).
JVM heap used: alert above 75% sustained. GC pressure above 85% will cause performance cliffs.
Search latency p99: track via GET _nodes/stats/indices/search. Alert if query_time_in_millis / query_total trends upward.
Indexing throughput: indexing.index_total delta. Alert on sudden drops (ingest pipeline down) or spikes (log storms).
Disk usage: alert at 75% — ILM needs headroom to complete rollovers and force-merges. Elasticsearch will refuse writes at 95% (flood_stage).

Common Anti-Patterns to Avoid

Dynamic Mapping Gone Wrong

By default, Elasticsearch auto-detects field types from the first document. This works for prototyping but causes problems in production: a field mapped as long can't store a string value from a later deployment. The mapping explosion problem occurs when dynamic mapping creates thousands of fields — often from JSON logs with arbitrary key names. Mitigate by:

# Disable dynamic mapping for unknown fields, but allow known ones
PUT /logs-app
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "@timestamp":  { "type": "date" },
      "service":     { "type": "keyword" },
      "level":       { "type": "keyword" },
      "message":     { "type": "text" },
      "duration_ms": { "type": "integer" },
      "labels": {
        "type": "object",
        "dynamic": true
      }
    }
  }
}

Deep Pagination

Using from + size for deep pagination is expensive — Elasticsearch must fetch and discard from + size documents from every shard. For paginating through large result sets, use search_after with a sort tiebreaker:

# First page
GET logs-app/_search
{
  "size": 100,
  "sort": [{ "@timestamp": "desc" }, { "_id": "asc" }]
}

# Subsequent pages — pass last sort values from previous response
GET logs-app/_search
{
  "size": 100,
  "sort": [{ "@timestamp": "desc" }, { "_id": "asc" }],
  "search_after": ["2026-04-16T12:34:56.000Z", "doc-id-xyz"]
}

Unbounded Wildcard Queries

Leading wildcards (*foo) require scanning every term in the index and are among the most expensive queries possible. Use edge N-gram tokenizers at index time to support prefix-style autocomplete without runtime wildcard cost. For full-text search, use match or match_phrase instead.

Designing a multi-tenant platform or migrating to SaaS?

We help teams architect multi-tenant systems that balance cost, isolation, and compliance — from database strategy to production operations. Let’s talk.

Send a Message

Multi-Tenant Architecture — Designing Systems That Scale Per Customer

What Is the Elastic Stack?

Elasticsearch Architecture Deep Dive

Nodes and Roles

Indices, Shards, and Replicas

How Indexing Works

Query DSL — Searching and Filtering

Bool Query — The Foundation

Aggregations

Index Lifecycle Management (ILM)

Defining an ILM Policy

Index Templates and Data Streams

Logstash — Data Pipeline Engine

A Production Pipeline

Persistent Queues and Dead Letter Queues

Beats — Lightweight Data Shippers

Filebeat — Log Collection

Metricbeat — System and Service Metrics

Kibana — Visualization and Operations

Discover — Ad-Hoc Exploration

Alerting Rules

Security — TLS, Authentication, and RBAC

Transport and HTTP TLS

RBAC — Roles and Users

Ingest Pipelines — Pre-Indexing Transformation

Production Best Practices

Cluster Sizing Rules of Thumb

Operating System Configuration

Snapshot and Restore

Monitoring the Cluster

Common Anti-Patterns to Avoid

Dynamic Mapping Gone Wrong

Deep Pagination

Unbounded Wildcard Queries

Designing a multi-tenant platform or migrating to SaaS?

Need help implementing this in production?

Multi-Tenant Architecture — Designing Systems That Scale Per Customer

What Is the Elastic Stack?

Elasticsearch Architecture Deep Dive

Nodes and Roles

Indices, Shards, and Replicas

How Indexing Works

Query DSL — Searching and Filtering

Bool Query — The Foundation

Aggregations

Index Lifecycle Management (ILM)

Defining an ILM Policy

Index Templates and Data Streams

Logstash — Data Pipeline Engine

A Production Pipeline

Persistent Queues and Dead Letter Queues

Beats — Lightweight Data Shippers

Filebeat — Log Collection

Metricbeat — System and Service Metrics

Kibana — Visualization and Operations

Discover — Ad-Hoc Exploration

Alerting Rules

Security — TLS, Authentication, and RBAC

Transport and HTTP TLS

RBAC — Roles and Users

Ingest Pipelines — Pre-Indexing Transformation

Production Best Practices

Cluster Sizing Rules of Thumb

Operating System Configuration

Snapshot and Restore

Monitoring the Cluster

Common Anti-Patterns to Avoid

Dynamic Mapping Gone Wrong

Deep Pagination

Unbounded Wildcard Queries

Designing a multi-tenant platform or migrating to SaaS?

Related Articles

Need help implementing this in production?