Question 1

Is Elasticsearch still relevant in 2026 with vector databases like Pinecone and Weaviate around?

Accepted Answer

Very much yes. Elasticsearch's vector search (dense_vector + HNSW, int8/int4 quantization, RRF for hybrid search, ELSER embeddings) closed most of the gap with dedicated vector DBs by 8.13. The big advantage of Elasticsearch in 2026 is that you don't have to operate two systems, your lexical search, filtering, aggregations, AND your vector RAG live in one cluster. Most India teams that started with Pinecone in 2023 have migrated back to Elasticsearch or OpenSearch by 2026 for cost and operational simplicity.

Question 2

How much does an Elasticsearch developer earn in India?

Accepted Answer

₹8-26 LPA in 2026. Entry-level backend roles that touch Elasticsearch (search APIs, log analytics) sit at ₹8-14 LPA. Senior search engineers who own ranking, relevance, or ES infra at scale at Flipkart, Swiggy, PhonePe, Razorpay, Postman, Cure.fit, or Zomato are in the ₹20-26 LPA range. Specialized 'search relevance engineer' and 'SRE for search' titles at the top end can go higher.

Question 3

Should I learn Elasticsearch or OpenSearch first?

Accepted Answer

Learn Elasticsearch, the API surface is 90% the same and Elasticsearch's docs and ecosystem are richer. Once you know one, switching to the other for a job is a one-week ramp. If your target employer is AWS-heavy (a lot of India SaaS) the practical day-to-day will often be OpenSearch on AWS Managed Service.

Question 4

What Elasticsearch version should I target for interview prep in 2026?

Accepted Answer

Elasticsearch 8.x is the answer. 8.0 (Feb 2022) shipped security-on-by-default and the first dense_vector / HNSW search. 8.11 (Nov 2023) introduced ES|QL. 8.13-8.15 brought int8/int4 quantization, RRF, and LOOKUP JOIN. Interviewers expect familiarity with all of these. Elasticsearch 9.x is starting to ship features in 2026 but most production clusters in India are still on 8.x.

Question 5

Do I need to know Lucene to use Elasticsearch well?

Accepted Answer

Not at the API level, Elasticsearch hides Lucene completely. But the mental model of Lucene (immutable segments, inverted index, term postings, FST, HNSW graphs) is essential to reason about performance, refresh_interval, force-merge, why updates are expensive, and why deep pagination kills you. Senior interview rounds will probe this. You don't need to write Lucene code, just understand why Elasticsearch behaves the way it does.

Question 6

What is Elasticsearch and what problems does it solve?

Accepted Answer

Elasticsearch is a distributed search and analytics engine built on top of Apache Lucene, first released by Shay Banon in 2010. It solves three problems that traditional databases handle poorly: (1) full-text search across millions or billions of documents with relevance scoring (BM25), (2) horizontally-scalable indexing and querying by sharding data across a cluster, and (3) near-real-time analytics on semi-structured data via aggregations. You write and read JSON documents over a REST API, and the engine handles inverted-index construction, tokenization, scoring, and replication for you. In 2026, the three dominant use cases in India are: e-commerce product search (Flipkart, Myntra, Swiggy), log/observability analytics (the ELK stack at PhonePe, Razorpay), and as a vector store for RAG (since dense_vector + kNN search matured in 8.x).

Question 7

Explain the inverted index. Why is it the core data structure in Elasticsearch?

Accepted Answer

An inverted index is a map from every distinct term in your corpus to the list of documents (and positions) that contain it. Compare with a database's normal (forward) index: 'doc → fields'. The inverted index flips this: 'term → docs'. So for the document set {1: 'the cat sat', 2: 'the dog sat'}, the inverted index is: 'the' → [1, 2], 'cat' → [1], 'sat' → [1, 2], 'dog' → [2]. Searching for 'cat sat' becomes a set-intersection over the postings lists, O(small) instead of scanning every document. Lucene stores this on disk in immutable segments and merges them in the background. This structure is why Elasticsearch can search billions of documents in milliseconds, but also why updates are expensive, you cannot edit a term in place, you have to mark the doc deleted and reindex.

Question 8

What is the analyzer pipeline? Walk through char filters, tokenizer, and token filters.

Accepted Answer

Every text field is processed by an analyzer at index time AND at query time. The pipeline has three stages, applied in order: (1) Character filters, operate on the raw string before tokenization. Examples: html_strip removes HTML tags, mapping replaces characters, pattern_replace runs regex. (2) Tokenizer, splits the string into tokens. Exactly one per analyzer. Common choices: standard (Unicode word boundaries), whitespace, keyword (no split), ngram, edge_ngram (for autocomplete). (3) Token filters, transform the stream of tokens. Run in order, and you can chain many. Common: lowercase, stop (remove stop words), stemmer/porter_stem, synonym, asciifolding (é → e). The same pipeline runs on the query string when you use a match query, that's how 'Running' in the doc matches 'run' in the query: both get stemmed to 'run'. Picking the right analyzer is the single biggest determinant of search quality.

Question 9

What is the difference between `text` and `keyword` field types?

Accepted Answer

These are the two string types in Elasticsearch and confusing them is the #1 beginner mistake. `text` runs through an analyzer, the string is tokenized, lowercased, possibly stemmed, and stored as multiple terms in the inverted index. You use it for full-text search via the match query. You CANNOT sort, aggregate, or do exact filtering on `text` fields without fielddata (which is memory-expensive). `keyword` is stored verbatim as a single term, no analysis. You use it for filtering, sorting, aggregations, and term-level queries (term, terms, prefix). The standard pattern is to map a string as both, via a multi-field: indexed as `text` for search and as `name.keyword` for aggregations. This is what Elasticsearch does automatically when you let dynamic mapping create string fields.

Question 10

How do you index, get, update, and delete a document?

Accepted Answer

Elasticsearch exposes a REST API for document CRUD. PUT /index/_doc/{id} or POST /index/_doc (auto-id) to index. GET /index/_doc/{id} to fetch. POST /index/_update/{id} to do a partial update (you supply only the fields that change, Elasticsearch fetches the doc, merges, and reindexes). DELETE /index/_doc/{id} to remove. Important: there is no true in-place update. Under the hood, an update is a delete + insert because Lucene segments are immutable. This is why high-update workloads are expensive in Elasticsearch, each update creates a new version of the doc and marks the old one for deletion (collected during segment merges).

Question 11

What is the bulk API and why is it important for ingestion performance?

Accepted Answer

The _bulk API lets you send many index/update/delete operations in one HTTP request. Each operation is two NDJSON lines: an action header and (for index/update) the document body. Bulk indexing is 10-100× faster than one-by-one indexing because you eliminate HTTP overhead and let Elasticsearch batch refresh and merge work. The sweet spot is 5-15 MB or 1000-5000 docs per batch, too small and overhead dominates, too large and you get timeouts and circuit breaker trips. For initial loads of large datasets at Flipkart-scale, also: temporarily set refresh_interval to -1, drop replicas to 0, then restore both after ingestion completes. This can cut import time by 5×.

Question 12

What is the difference between a match query and a term query?

Accepted Answer

match runs the query string through the same analyzer as the field, so 'Running Shoes' becomes ['run', 'shoe'] and matches docs that contain those terms in any order. Use match for full-text search on `text` fields. term does NOT analyze the input, it looks up the literal value in the inverted index. Use term for exact matches on `keyword` fields, numbers, booleans, dates, IPs. The classic bug: running term on a text field with a multi-word string ('term: "Running Shoes"') always returns 0 hits, because the indexed terms are 'run' and 'shoe' separately. Rule of thumb: text fields → match family (match, multi_match, match_phrase). keyword/numeric/date/boolean → term family (term, terms, range, prefix, wildcard).

Question 13

What is a bool query and how do you combine must, should, filter, and must_not?

Accepted Answer

bool is the workhorse compound query in Elasticsearch, almost every non-trivial search is wrapped in one. It has four clauses: (1) must, the clause must match, AND it contributes to the score. (2) should, at least one (by default) must match; contributes to score. Acts like an OR with relevance boosting. (3) filter, the clause must match, but is run in 'filter context' so it does NOT contribute to score and is cached. Filters are dramatically faster on repeated queries, always prefer them when you don't need scoring. (4) must_not, the clause must not match; runs in filter context. The typical pattern: text relevance in must, exact category/price/availability filters in filter, optional boosts in should.

Question 14

What is sharding and replication in Elasticsearch?

Accepted Answer

A shard is a single Lucene index, it holds a slice of an Elasticsearch index. When you create an Elasticsearch index with number_of_shards=3, your data is partitioned across 3 primary shards. Each shard is also replicated number_of_replicas times for redundancy (typical: 1 replica per primary). So an index with 3 primaries and 1 replica = 6 total shards spread across the cluster. Sharding gives you horizontal scalability, searches fan out across shards and results are merged. Replication gives you fault tolerance and read throughput (replicas can serve searches). Big gotcha: you cannot change number_of_shards on an existing index, you must reindex into a new one. So pick shard count carefully at index-creation time: aim for shards in the 10-50 GB range, with a total shard count under 600 per data node.

Question 15

What is Kibana and what is it used for?

Accepted Answer

Kibana is the official Elastic UI, it sits on top of Elasticsearch and provides four big things: (1) Discover, an interactive log/document explorer with KQL filtering, the bread and butter for observability teams. (2) Visualize / Lens, drag-and-drop chart builder backed by Elasticsearch aggregations. (3) Dashboards, pin charts together for SREs / business teams. (4) Dev Tools, a console for running raw _search and _cluster API calls. In 2026, most India teams use Kibana primarily for log analytics (ELK stack with Filebeat shipping app logs) and for APM dashboards. It is also the home of the new ES|QL editor introduced in 8.11. Kibana itself stores its config in a `.kibana` system index inside the same Elasticsearch cluster.

Elasticsearch Interview Questions

What is Elasticsearch and what problems does it solve?

Explain the inverted index. Why is it the core data structure in Elasticsearch?

What is the analyzer pipeline? Walk through char filters, tokenizer, and token filters.

What is the difference between `text` and `keyword` field types?

How do you index, get, update, and delete a document?

What is the bulk API and why is it important for ingestion performance?

What is the difference between a match query and a term query?

What is a bool query and how do you combine must, should, filter, and must_not?

What is sharding and replication in Elasticsearch?

What is Kibana and what is it used for?

What is Logstash and how does it fit into the ELK stack?

How does Elasticsearch's relevance scoring (BM25) work at a high level?

Explain multi_match and its `type` parameter (best_fields, most_fields, cross_fields, phrase).

What is function_score and when would you use it?

What is the difference between `nested` and `object` field types? Why does it matter?

What is dynamic mapping and how can it explode in production?

Explain refresh_interval and the index/refresh/flush lifecycle.

What are aggregations? Explain bucket, metric, and pipeline aggregations.

What is the role of the master node, data node, ingest node, and ML node?

How does Elasticsearch search work across shards? What is the query-then-fetch model?

What is the difference between query context and filter context?

What are ingest pipelines and how do they compare to Logstash?

How do you do autocomplete / search-as-you-type in Elasticsearch?

What is Index Lifecycle Management (ILM) and when do you need it?

How does security work in Elasticsearch 8.x by default?

What is ES|QL and how does it differ from the Query DSL?

How do you implement vector search / kNN for RAG in Elasticsearch?

How would you architect Elasticsearch for product search at Flipkart-scale (100M+ products, 1B+ queries/day)?

Elasticsearch vs OpenSearch in 2026, what's the difference and how do you choose?

How do you debug a slow search query in Elasticsearch?

Companies Hiring Elasticsearch

Salary Insights

Frequently Asked Questions

Is Elasticsearch still relevant in 2026 with vector databases like Pinecone and Weaviate around?

How much does an Elasticsearch developer earn in India?

Should I learn Elasticsearch or OpenSearch first?

What Elasticsearch version should I target for interview prep in 2026?

Do I need to know Lucene to use Elasticsearch well?

Introduction

Ready to practice Elasticsearch interviews?