Elasticsearch best practices + unknown twists

Notes from running Elasticsearch in production — sharding, segments, refreshes, and a profiling trick that saved us.

“Performance isn’t a problem — until it suddenly is.”

This started as an internal doc for my AI team after a few weeks of debugging a slow Elasticsearch cluster. It’s a checklist of things to verify, plus a few twists I didn’t see coming.

Check the number of shards in a specific index

from elasticsearch import Elasticsearch

es_client = Elasticsearch(...)

settings = es_client.indices.get(index=es_index.name)

for index, config in settings.items():
    num_shards = config['settings']['index']['number_of_shards']
    num_replicas = config['settings']['index']['number_of_replicas']
    print(f"Index: {index}")
    print(f"  Primary shards: {num_shards}")
    print(f"  Replicas per shard: {num_replicas}")

Or, for any index:

shards = es_client.cat.shards(format="json")

for shard in shards:
    print(f"Index: {shard['index']} | Shard: {shard['shard']} | State: {shard['state']} | Node: {shard['node']}")

Notes:

Two benefits to having replicas:

Each index has its own shards. So if there are n indexes and each has k shards, the total is n × k. That leads to the first rule of thumb:

👉 In a single node, there should only be 50–100 shards in total.

So if each index has 1 shard, that’s 50–100 indexes per node max. You can estimate the cap from the shards you reserve per index.

from collections import Counter

shard_counts = Counter([shard["node"] for shard in shards])
print("Shard distribution per node:")
for node, count in shard_counts.items():
    print(f"  {node}: {count} shards")

Second rule of thumb — shard sizing:

Shard sizeStatusUse when…
10–50 GBIdealMost general-purpose use cases
<10 GBToo smallMay cause segment bloat, inefficiency
50–100 GBAcceptableOnly if node RAM/heap is large
>100 GBRiskyProne to GC issues, long recoveries

Let’s calculate the average shard size:

total_size_mb = 0
shard_count = 0

for shard in shards:
    store_size = shard.get("store", "0mb").lower()
    if store_size.endswith("gb"):
        size = float(store_size[:-2]) * 1024
    elif store_size.endswith("mb"):
        size = float(store_size[:-2])
    else:
        size = 0
    total_size_mb += size
    shard_count += 1

avg_size = total_size_mb / shard_count if shard_count else 0
print(f"\nAverage shard size: {avg_size:.2f} MB")

Why too-small shards hurt:

Check the number of segments in your index

Hierarchy: a Node contains an Index, which contains Shards, which contain Segments

Think of it like this:

🧩 Segments are added over time as documents are indexed and refreshed.

(Holy sh*t 🥲)

Let’s check how many segments we have in some random index:

segments = es_client.indices.segments(index=es_index.name)

for shard_id, shard_info in segments['indices'][es_index.name]['shards'].items():
    for replica in shard_info:
        segment_count = replica['num_search_segments']
        node = replica['routing']['node']
        print(f"Shard {shard_id} on node {node}{segment_count} segments")

If there are too many segments, we have to merge them — heads up, this takes time:

response = es_client.indices.forcemerge(
    index=es_index.name,
    max_num_segments=2,
    only_expunge_deletes=False,
    flush=True,
    wait_for_completion=True,
    request_timeout=600,
)

print("Force merge triggered:", response)

This leads to the next lesson I learned.

Refresh is expensive

A refresh is the process by which Elasticsearch makes newly indexed documents searchable. By default, it happens every second (or based on index.refresh_interval).

When you’re indexing a large volume of documents, auto-refreshing every second:

Best practice during bulk indexing:

  1. Temporarily disable refresh
  2. Perform your bulk indexing
  3. Manually refresh once at the end
  4. Restore the refresh interval (optional)

In code:

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch("http://localhost:9200")

INDEX_NAME = "my-index"

# 1. Disable automatic refresh
es.indices.put_settings(
    index=INDEX_NAME,
    body={"index": {"refresh_interval": "-1"}},
)

# 2. Bulk indexing
documents = [
    {"_index": INDEX_NAME, "_id": i, "_source": {"title": f"Doc {i}", "value": i}}
    for i in range(10000)
]

helpers.bulk(es, documents)

# 3. Manually trigger a refresh
es.indices.refresh(index=INDEX_NAME)

# 4. Restore the default refresh interval (optional)
es.indices.put_settings(
    index=INDEX_NAME,
    body={"index": {"refresh_interval": "1s"}},
)

print("Bulk indexing completed with optimized refresh settings.")

Profile your query (most important)

Profiling a request is an important feature in pretty much any database. The point is to dissect a query into phases and see how long each phase takes — the ultimate indicator of performance, and the easiest way to identify the actual bottleneck.

In Elasticsearch, a normal query has two parts:

Query enters Search phase which produces score + pointer, then Fetch phase loads the actual document data

Let’s try:

# this is how we did it in our code originally
# we disable source (source=False) so the query only returns score + id
# only data from memory, so it should be fast right?

profile = es_client.search(
    index="search_185_dev_v1",
    profile=True,
    size=10000,
    source=False,
    query={
        "match": {
            "block.text.text": "AI"
        }
    },
)

print("Profile took:", profile["took"], "ms")
for shard in profile["profile"]["shards"]:
    print("Shard query time:", shard["searches"][0]["query"][0]["time_in_nanos"] / 1e6, "ms")
    print("Shard fetch time:", shard["fetch"]["time_in_nanos"] / 1e6, "ms")

Wrong. It’s very slow.

To see more details, you can print the whole profile data:

print(json.dumps(dict(profile)['profile'], indent=2))
Profile result example
{
  "shards": [
    {
      "id": "[61N79He-S1u1hbGEXAHJAw][external_referential_rncp_dev_v1][0]",
      "node_id": "61N79He-S1u1hbGEXAHJAw",
      "shard_id": 0,
      "index": "external_referential_rncp_dev_v1",
      "cluster": "(local)",
      "searches": [
        {
          "query": [
            {
              "type": "ConstantScoreQuery",
              "description": "ConstantScore(*:*)",
              "time_in_nanos": 2606684,
              "breakdown": { "next_doc": 2566124, "next_doc_count": 10001 }
            }
          ],
          "rewrite_time": 8000,
          "collector": [
            {
              "name": "QueryPhaseCollector",
              "reason": "search_query_phase",
              "time_in_nanos": 7681516
            }
          ]
        }
      ],
      "aggregations": [],
      "fetch": {
        "type": "fetch",
        "time_in_nanos": 2911239639,
        "breakdown": {
          "load_stored_fields": 2891235809,
          "load_stored_fields_count": 10000
        }
      }
    }
  ]
}

Some terminology:

ParameterPulls fromBest use case
_sourceOriginal JSONReturn full or partial documents
stored_fieldsStored fields (if enabled)Retrieve specific fields quickly
fieldsDoc values or storedShow fields in search results, flexible
docvalue_fieldsDoc valuesRetrieve formatted numbers/dates
More details on each parameter

_source

  • The original JSON document as it was indexed
  • A search or get request returns it (entirely or partially) by default
  • Stored by default, very flexible
"_source": ["title", "author"]

Returns only those fields from the original document.

Use when: you want the actual document content.

stored_fields

  • Refers to fields that have store: true in the mapping
  • By default, fields are not separately stored (because _source already keeps the full doc)
  • Only useful if you’ve explicitly enabled store: true for a field
"stored_fields": ["title"]

If store: true wasn’t set for title, this returns nothing.

Use when: you need fast access to specific fields and don’t want to load the full _source.

fields

  • Retrieves field values using field data, doc values, or stored fields
  • Unlike stored_fields, this works even if store: true isn’t set
  • Can return multi-valued fields, nested data, and fields processed for sorting or aggregations
"fields": ["publish_date", "category"]

Use when: you want runtime fields, scripted fields, or fields formatted for presentation.

docvalue_fields

  • Specific to doc values — a columnar data structure optimized for sorting, aggregations, and scripting
  • Commonly used for dates and numbers when you want them formatted
"docvalue_fields": [
  { "field": "publish_date", "format": "yyyy-MM-dd" }
]

Use when: you want formatted field values, or data optimized for sorting/aggregations.

More: Elasticsearch docs on retrieving selected fields.

Note:

The right way to only return score + id from memory:

  1. Enable _id in docvalues (without this, the next step won’t work):

    es.cluster.put_settings(body={
        "persistent": {
            "indices.id_field_data.enabled": True
        }
    })
    
    print("✅ _id fielddata enabled.")

    This is deprecated, though. The recommended way is to artificially have an id attribute (like we did) and register it as a docvalue.

  2. Minor changes to the query, but very significant:

    profile = es_client.search(
        index="search_185_dev_v1",
        profile=True,
        size=1000,
        source=False,
        query={
            "match": {
                "block.text.text": "AI"
            }
        },
        docvalue_fields=['_id'],
        stored_fields="_none_",  # not None (default), not [], must be
                                 # _none_ to disable this entirely (wth)
    )

Check the current status of all nodes

thread_pool_stats = es_client.nodes.stats(metric="thread_pool")

for node_id, stats in thread_pool_stats["nodes"].items():
    thread_pools = stats["thread_pool"]
    search_pool = thread_pools.get("search", {})
    print(f"Node: {stats['name']}")
    print(f"  Active: {search_pool.get('active')}")
    print(f"  Queue: {search_pool.get('queue')}")
    print(f"  Rejected: {search_pool.get('rejected')}")

It’s not good if too many requests are getting rejected.

Check heap, CPU, disk

node_stats = es.nodes.stats(metric=["jvm", "fs", "os", "thread_pool"])

for node_id, node in node_stats["nodes"].items():
    name = node["name"]
    heap = node["jvm"]["mem"]
    cpu = node["os"]["cpu"]
    fs = node["fs"]["total"]

    print(f"\nNode: {name}")
    print(f"  Heap used: {heap['heap_used_percent']}%")
    print(f"  CPU load avg (1m): {cpu['load_average']['1m']}")
    print(f"  Disk free: {fs['free_in_bytes'] / fs['total_in_bytes']:.1%}")

Ideally on each node:

Take-home

If you remember nothing else from this post, remember these:

Performance work in Elasticsearch is mostly about not creating problems for yourself: right-sized shards, controlled refresh, and profiled queries. Get those three right and most of the rest takes care of itself.