What industries do you work with?

We work across a wide range of industries including finance, healthcare, e-commerce, logistics, and telecommunications. Our solutions are tailored to each client’s specific domain requirements and regulatory environment.

How long does a typical engagement take?

It depends on the scope. A focused observability deployment or automation workflow can be delivered in 4-6 weeks. Larger initiatives like full-scale LLM integration or platform builds typically run 2-4 months. We always start with a discovery phase to align on timelines.

Do you offer ongoing support after project delivery?

Yes. We offer flexible support and maintenance plans to ensure your systems stay healthy, updated, and optimized. We can also embed with your team on a part-time basis for continuous improvement.

Can you work with our existing tech stack?

Absolutely. We integrate with your current infrastructure and tools rather than forcing a rip-and-replace. Whether you’re on AWS, GCP, Azure, or on-prem, we adapt our approach to what works best for your environment.

What is your pricing model?

We offer both fixed-price project engagements and time-and-materials contracts depending on the nature of the work. Reach out through our contact form and we’ll provide a tailored estimate within 24 hours.

How do you handle data security and compliance?

Security is built into every engagement. We follow industry best practices for data handling, support GDPR and SOC 2 compliance requirements, and can work within your existing security policies and access controls.

Elasticsearch Read Optimization — Tuning for Faster Search

Search performance in Elasticsearch depends on a combination of factors, including how expensive individual queries are, how many searches run in parallel, the number of indices and shards involved, and the overall sharding strategy and shard size. While hardware and system-level settings play an important role, the structure of your documents and the design of your queries often have the biggest impact. Teams evaluating alternatives to Elasticsearch may also want to review migrating to OpenSearch, the open-source fork with a compatible API.

Note

These variables influence how the system should be tuned. For example, optimizing for a small number of complex queries differs significantly from optimizing for many lightweight, concurrent searches. Make sure to also consider your cluster's shard count, index layout, and overall data distribution.

Give Memory to the Filesystem Cache

Elasticsearch heavily relies on the filesystem cache to make search fast. In general, you should make sure that at least half the available memory goes to the filesystem cache so that Elasticsearch can keep hot regions of the index in physical memory.

By default, Elasticsearch automatically sets its JVM heap size to follow this best practice. However, in self-managed or Elastic Cloud on Kubernetes deployments, you have the flexibility to allocate even more memory to the filesystem cache, which can lead to performance improvements depending on your workload.

Note

On Linux, the filesystem cache uses any memory not actively used by applications. To allocate memory to the cache, ensure that enough system memory remains available and is not consumed by Elasticsearch or other processes.

Avoid Page Cache Thrashing on Linux

Search can cause a lot of randomized read I/O. When the underlying block device has a high readahead value, there may be a lot of unnecessary read I/O done, especially when files are accessed using memory mapping.

Most Linux distributions use a sensible readahead value of 128KiB for a single plain device, however, when using software RAID, LVM or dm-crypt the resulting block device may end up having a very large readahead value (in the range of several MiB). This usually results in severe page cache thrashing adversely affecting search performance.

You can check the current value using:

lsblk -o NAME,RA,MOUNTPOINT,TYPE,SIZE

Warning

blockdev expects values in 512 byte sectors whereas lsblk reports values in KiB. For example, to temporarily set readahead to 128KiB for /dev/nvme0n1:

blockdev --setra 256 /dev/nvme0n1

Use Faster Hardware

If your searches are I/O-bound, consider increasing the size of the filesystem cache or using faster storage. Each search involves a mix of sequential and random reads across multiple files, and there may be many searches running concurrently on each shard, so SSD drives tend to perform better than spinning disks. If your searches are CPU-bound, consider using a larger number of faster CPUs.

Directly-attached (local) storage generally performs better than remote storage because it is simpler to configure well and avoids communications overheads. With careful tuning, it is sometimes possible to achieve acceptable performance using remote storage too — but always benchmark with a realistic workload before committing to a particular storage architecture.

Document Modeling

Documents should be modeled so that search-time operations are as cheap as possible. In particular, joins should be avoided. nested can make queries several times slower and parent-child relations can make queries hundreds of times slower. If the same questions can be answered without joins by denormalizing documents, significant speedups can be expected.

Search as Few Fields as Possible

The more fields a query_string or multi_match query targets, the slower it is. A common technique to improve search speed over multiple fields is to copy their values into a single field at index time using the copy_to directive:

PUT movies
{
  "mappings": {
    "properties": {
      "name_and_plot": {
        "type": "text"
      },
      "name": {
        "type": "text",
        "copy_to": "name_and_plot"
      },
      "plot": {
        "type": "text",
        "copy_to": "name_and_plot"
      }
    }
  }
}

Pre-index Data

Leverage patterns in your queries to optimize how data is indexed. For instance, if most queries run range aggregations on a fixed list of ranges, you can pre-index those ranges and use a terms aggregation instead. For semantic and similarity search use cases that complement keyword search, see pgvector for semantic search as a complementary approach.

PUT index
{
  "mappings": {
    "properties": {
      "price_range": {
        "type": "keyword"
      }
    }
  }
}

PUT index/_doc/1
{
  "designation": "spoon",
  "price": 13,
  "price_range": "10-100"
}

Map Identifiers as Keyword

Not all numeric data should be mapped as a numeric field type. Elasticsearch optimizes numeric fields for range queries, but keyword fields are better for term-level queries. Identifiers such as ISBN or product IDs are rarely used in range queries but are often retrieved using term-level queries. Consider mapping them as keyword for faster retrieval.

Query-Level Optimizations

Avoid Scripts

If possible, avoid using script-based sorting, scripts in aggregations, and the script_scorequery. Scripts bypass many of Elasticsearch's built-in caching and optimization mechanisms.

Search Rounded Dates

Queries on date fields that use now are typically not cacheable since the range changes constantly. Switching to rounded dates makes better use of the query cache:

GET index/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "my_date": {
            "gte": "now-1h/m",
            "lte": "now/m"
          }
        }
      }
    }
  }
}

The longer the rounding interval, the more the query cache can help — but too aggressive rounding might hurt user experience.

Force-Merge Read-Only Indices

Indices that are read-only may benefit from being merged down to a single segment. This is typical with time-based indices: only the current time frame receives new documents while older indices are read-only. Single-segment shards can use simpler and more efficient data structures.

Warning

Do not force-merge indices to which you are still writing, or will write again in the future. Rely on the automatic background merge process instead. Continuing to write to a force-merged index can severely degrade performance.

Cache & Warm-up Strategies

Warm Up Global Ordinals

Global ordinals optimize aggregation performance and are calculated lazily by default. For fields heavily used in bucketing aggregations, you can tell Elasticsearch to construct and cache them before requests arrive:

PUT index
{
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "eager_global_ordinals": true
      }
    }
  }
}

Warm Up the Filesystem Cache

After a restart, the filesystem cache is empty. You can explicitly tell the OS which files to load eagerly using the index.store.preloadsetting. Use with caution — loading too many files will hurt performance if the cache can't hold all the data.

Use Preference for Cache Utilization

Elasticsearch maintains caches at the node level. With round-robin routing (the default), consecutive identical requests hit different shard copies, preventing cache reuse. Using a preference value that identifies the current user or session routes requests consistently and improves cache hit rates.

Index-Level Tuning

Index Sorting for Faster Conjunctions

Index sorting can make conjunctions (AND queries) faster at the cost of slightly slower indexing. When documents are sorted within a segment, Elasticsearch can skip entire blocks of non-matching documents during query evaluation.

Use index_phrases and index_prefixes

The text field supports index_phrases (indexes 2-shingles for faster phrase queries) and index_prefixes (indexes term prefixes for faster prefix queries). If your use case involves many phrase or prefix queries, these options can provide significant speedups.

Use constant_keyword for Filtering

If a filter matches most documents in an index, consider splitting data into dedicated indices and using constant_keyword to let Elasticsearch transparently skip the filter:

PUT bicycles
{
  "mappings": {
    "properties": {
      "cycle_type": {
        "type": "constant_keyword",
        "value": "bicycle"
      },
      "name": {
        "type": "text"
      }
    }
  }
}

On this index, Elasticsearch will automatically ignore any filter on cycle_type: bicycle, making the query cheaper without changing client-side logic.

Replicas & Throughput

Replicas improve resiliency and can help with throughput, but not always. The setup with fewer shards per node in total usually performs better because it gives a greater share of the filesystem cache to each shard. There's a trade-off between throughput and availability.

The formula: if you have num_nodes nodes, num_primaries primary shards, and want to cope with max_failures node failures, the optimal replica count is:

max(max_failures, ceil(num_nodes / num_primaries) - 1)

Monitoring & Profiling

Use the Search Profiler in Kibana to navigate and analyze the Profile API results. It gives insight into how each component of your queries and aggregations impacts processing time, helping you identify bottlenecks and tune accordingly. For exposing search APIs efficiently behind a gateway with rate limiting and caching, see API gateway patterns in production.

Keep an eye on open search contexts by polling the node stats API:

GET _nodes/stats/indices/search

High open_contextsvalues can indicate a backlog or overly long scroll timeouts. Clear scrolls as soon as they're no longer needed to release resources.

For managing index retention and hot-warm-cold tiering automatically, see Elasticsearch Index Lifecycle Management. For a full walkthrough of the Elastic Stack including ingestion, visualization, and cluster operations, see Elastic Stack Complete Guide 2026.

Building an event-driven platform or migrating from synchronous services?

We help teams design and implement event-driven architectures with Kafka, Schema Registry, and stream processing — from topic modeling to production operations. Let’s talk.

Send a Message

Event-Driven Architecture with Kafka & Schema Registry

Give Memory to the Filesystem Cache

Avoid Page Cache Thrashing on Linux

Use Faster Hardware

Document Modeling

Search as Few Fields as Possible

Pre-index Data

Map Identifiers as Keyword

Query-Level Optimizations

Avoid Scripts

Search Rounded Dates

Force-Merge Read-Only Indices

Cache & Warm-up Strategies

Warm Up Global Ordinals

Warm Up the Filesystem Cache

Use Preference for Cache Utilization

Index-Level Tuning

Index Sorting for Faster Conjunctions

Use index_phrases and index_prefixes

Use constant_keyword for Filtering

Replicas & Throughput

Monitoring & Profiling

Building an event-driven platform or migrating from synchronous services?

Need help implementing this in production?

Event-Driven Architecture with Kafka & Schema Registry

Give Memory to the Filesystem Cache

Avoid Page Cache Thrashing on Linux

Use Faster Hardware

Document Modeling

Search as Few Fields as Possible

Pre-index Data

Map Identifiers as Keyword

Query-Level Optimizations

Avoid Scripts

Search Rounded Dates

Force-Merge Read-Only Indices

Cache & Warm-up Strategies

Warm Up Global Ordinals

Warm Up the Filesystem Cache

Use Preference for Cache Utilization

Index-Level Tuning

Index Sorting for Faster Conjunctions

Use index_phrases and index_prefixes

Use constant_keyword for Filtering

Replicas & Throughput

Monitoring & Profiling

Building an event-driven platform or migrating from synchronous services?

Related Articles

Need help implementing this in production?