Question 1

Is Neo4j free to use in production?

Accepted Answer

Neo4j Community Edition is free and open-source (GPLv3), runs on a single node, and is enough for many production use cases. Enterprise Edition adds clustering, role-based access, online backups, and is what you need for HA, it's commercial (subscription) but also available as managed Neo4j Aura. Most Indian startups start on Community, move to Aura or self-managed Enterprise as scale demands.

Question 2

How much does a Neo4j developer earn in India?

Accepted Answer

₹8-25 LPA in 2026 for mid-to-senior backend / data engineers who pair Neo4j with strong SQL or distributed-systems experience. Companies hiring: Razorpay, Cred, Flipkart, Myntra, Swiggy, Tata 1mg, UBS, eBay India, and most fraud/recommendation teams at fintechs. Knowing Cypher AND GDS AND a vector index workflow (RAG) puts you at the upper end of the band.

Question 3

Is Cypher hard to learn coming from SQL?

Accepted Answer

Easier than most people expect. The mental shift is from joins to pattern-matching, once you can read `(a:User)-[:FRIEND]->(b:User)` as 'user a is a friend of user b', the rest follows. A SQL developer who builds something real (a friend-of-friend query, a recommendation engine) can be productive in a week, and fluent in a month. The Cypher → GQL standardisation in 2024 makes the investment more durable than ever.

Question 4

When should I NOT use Neo4j?

Accepted Answer

Skip Neo4j if your queries are mostly tabular aggregations (use Postgres / ClickHouse), if you only need single-key lookups at huge QPS (Redis / DynamoDB), or if your data isn't actually connected (no graph shape = no graph win). A common anti-pattern is forcing a graph model on simple OLTP data just because graphs are interesting, you'll lose on operational simplicity. The right call is usually Postgres for OLTP plus Neo4j alongside for the connected slice (fraud, recommendations, identity).

Question 5

Does Neo4j support ACID transactions like SQL databases?

Accepted Answer

Yes, full ACID, including across multiple nodes and relationships in a single transaction. Write-ahead log on disk for durability, MVCC isolation, deferred constraint checking. This is what separates Neo4j from non-ACID graph systems like older Titan/JanusGraph, financial-grade fraud detection at Razorpay or UBS requires that a fraud flag and a transaction record commit together or not at all.

Question 6

What's the role of Neo4j in LLM and RAG architectures in 2026?

Accepted Answer

Neo4j has become a serious contender for knowledge-graph-backed RAG (GraphRAG) since vector indexes landed in 5.11. The pattern: store document chunks and their embeddings AS nodes, connect them to extracted entities (people, products, concepts), then at query time do vector search + graph expansion in one query. Anthropic, Microsoft, and many India AI startups (Sarvam, Krutrim partners) have published blog posts on GraphRAG over Neo4j; it consistently produces better answers than pure vector search on connected domains like medical records, legal contracts, and product catalogues.

Question 7

What is Neo4j and how does the property graph model differ from a relational database?

Accepted Answer

Neo4j is a native graph database, meaning storage, query engine, and indexes are all designed around graph traversal rather than table joins. The data model is the property graph, which has four primitives: (1) Nodes, entities like a User or Order, each with one or more labels and a property map. (2) Relationships, typed, directed connections between two nodes, like (:User)-[:PLACED]->(:Order). Relationships are first-class, they have their own ID, type, and property map. (3) Labels, node category tags like :User or :Product, used by indexes and the query planner. (4) Properties, key-value pairs on nodes and relationships, stored in a separate file from the topology so traversal can read connectivity without paging in property bytes. Versus relational: in Postgres, finding 'friends of friends of friends' requires three JOINs on a user_friends table, each one re-scanning the index and growing the intermediate result set; in Neo4j, the engine follows pointers between adjacent records in constant time per hop, a property called 'index-free adjacency'. That's why graph beats SQL the moment you cross 3-4 hops or have deeply connected data. In a relational DB, a Razorpay fraud-ring query touching 6 hops can take minutes; in Neo4j, milliseconds. Concretely, Neo4j stores nodes and relationships as fixed-size records on disk, a node record is 15 bytes, a relationship record is 34 bytes, with pointers to neighbouring records. This is what makes 'index-free adjacency' possible: once you've located a starting node (via an index), every subsequent hop is a constant-time pointer dereference, not another index lookup. Practical implication: query latency stays flat as the graph grows. Adding a million more users to your social graph doesn't slow down a 3-hop friend lookup, because the work scales with hops × average degree, not total graph size. This single property is why companies like NASA (lessons-learned graph across 50 years of missions), eBay (product knowledge graph), and UBS (institutional knowledge + fraud) standardised on Neo4j.

Question 8

What is Cypher and how do you write your first MATCH query?

Accepted Answer

Cypher is Neo4j's declarative query language, designed to look like ASCII art for graphs. A node is `(variable:Label {prop: value})`, a relationship is `-[variable:TYPE {prop: value}]->`, and a path strings them together. MATCH finds patterns; RETURN projects results. The mental model is to draw what you want, then ask the engine to find it. Cypher has been standardised as GQL (ISO/IEC 39075:2024), so what you learn here transfers to Memgraph, AuraDB, Neptune (with openCypher), and any other GQL-compliant store.

Question 9

What is the difference between CREATE and MERGE?

Accepted Answer

CREATE always inserts a new node or relationship, it never checks for duplicates, so running it twice gives you two identical nodes. MERGE is 'match or create': it tries to MATCH the pattern first, and if nothing is found, it CREATEs it. MERGE is the right primitive for idempotent imports (ETL jobs that may rerun), upserts, and any time you want exactly-one-of-something. The gotcha: MERGE locks the entire matched pattern for the duration of the transaction to prevent two concurrent writers from both deciding to create the same node. That makes MERGE significantly slower than CREATE under high write concurrency, for bulk loads where you know the data is fresh, prefer CREATE and let a UNIQUE constraint catch dupes, or pre-deduplicate the input. Another subtlety: MERGE on a relationship `MERGE (a)-[:KNOWS]->(b)` requires both endpoints to already exist (or be matched in the same query). And MERGE on a pattern with non-existent properties will CREATE the pattern with exactly those properties, so `MERGE (u:User {email: $email, name: $name})` is different from `MERGE (u:User {email: $email}) SET u.name = $name`: the first finds only users matching BOTH email and name (and creates a new node if either differs), while the second finds the unique user by email and updates the name. The second form is almost always what you want in practice.

Question 10

What is WHERE used for in Cypher and how is it different from filters inside the MATCH pattern?

Accepted Answer

WHERE filters bindings produced by MATCH (or WITH, UNWIND, etc.). The planner sometimes pushes WHERE conditions down to the index lookup, so they can be effectively the same as inline filters. Inline filters like `(:User {email: $email})` are syntactic sugar, they work for exact-equality on a single property. Use WHERE when you need anything more: range comparisons, multiple predicates, OR, NOT, regex (`=~`), list IN, property existence (`IS NOT NULL`), or label predicates (`n:User OR n:Admin`). The planner is smart enough that a WHERE clause on an indexed property still triggers an index seek, you don't pay extra by using WHERE.

Question 11

What is RETURN and how do you use aggregations?

Accepted Answer

RETURN projects the final result, it's analogous to SELECT in SQL. Cypher's aggregation model is implicit: if you mix grouping keys (raw values) with aggregating functions (count, sum, avg, min, max, collect) in the same RETURN, everything that isn't aggregated becomes a GROUP BY key automatically. No explicit GROUP BY clause needed. `collect()` is especially powerful, it gathers all values from a group into a list, which is how you build nested JSON-shaped results without a second query. `count(*)` counts rows; `count(DISTINCT x)` counts unique values.

Question 12

What is WITH in Cypher and why is it important?

Accepted Answer

WITH is Cypher's pipeline operator, it passes results from one part of a query to the next, like a pipe in shell. It serves three jobs: (1) Re-scope variables, only the names listed in WITH stay visible after it. (2) Aggregate then continue, you can aggregate, then MATCH again using the aggregated values. (3) Order/limit in the middle of a query, e.g. find top 100 users, then expand their orders. Without WITH, every MATCH would be evaluated as one giant pattern, which makes large queries unreadable and harder for the planner. The mental model: WITH = checkpoint + re-shape.

Question 13

What is a label and how is it different from a property?

Accepted Answer

A label is a category tag attached to a node (a node can have multiple). Labels are first-class, the engine maintains a label scan that lets you cheaply enumerate `(:User)` or count nodes with a given label. Properties are just key-value data on the node. Use labels for categories that participate in indexes and constraints (User, Order, Product), and use properties for everything else. A common rookie mistake is encoding a category as a property (`{type: 'user'}`), the engine can't use a property the way it uses a label for planning, and you lose the ability to attach a label-specific index. A node CAN have multiple labels (e.g., `(:User:Customer:Indian)`) which is great for facet-style filtering.

Question 14

How do you create an index in Neo4j and why does it matter?

Accepted Answer

Without an index, Neo4j has to do a label scan + property comparison for every node with that label, fine for a few thousand, fatal at a million. Create indexes for any property you filter on. The main types in 5.x: (1) RANGE (the default since 5.0, replaces BTREE), for equality and range. (2) TEXT, optimized for string prefix and CONTAINS searches. (3) POINT, for geospatial queries. (4) FULLTEXT, for natural-language search powered by Lucene. (5) VECTOR, for cosine/euclidean similarity on embeddings, added in 5.11 for RAG. (6) LOOKUP, automatic per-label and per-rel-type, never drop these. Always verify with `SHOW INDEXES` and check usage with `PROFILE`.

Question 15

What is a constraint in Neo4j?

Accepted Answer

A constraint enforces a schema rule at write time. The four types in 5.x: (1) UNIQUE, only one node with a given label can have a given property value, like a primary key. (2) NODE KEY, UNIQUE + the property must exist; this is the closest equivalent to a SQL primary key. (3) NODE/RELATIONSHIP PROPERTY EXISTENCE, the property cannot be null. (4) NODE/REL PROPERTY TYPE, the property must be a specific type (introduced in 5.9). UNIQUE constraints automatically create a backing index, you don't need a separate CREATE INDEX for the same property. In production you ALWAYS want a UNIQUE constraint on the natural key (email, external_id) before doing any MERGE; otherwise MERGE can deadlock under concurrency and you can end up with duplicate nodes.

Question 16

How do you delete nodes and relationships safely?

Accepted Answer

Neo4j refuses to delete a node that still has relationships, you'd be leaving dangling pointers. So you either delete the relationships first, or use DETACH DELETE which removes the node and all attached relationships in one go. For large deletes, never run a single `MATCH (n) DETACH DELETE n` over millions of nodes, it builds a giant transaction, blows out heap, and may roll back at the end. Instead, batch via `CALL { ... } IN TRANSACTIONS OF 10000 ROWS` (5.x) so each batch commits independently.

Neo4j Interview Questions

What is Neo4j and how does the property graph model differ from a relational database?

What is Cypher and how do you write your first MATCH query?

What is the difference between CREATE and MERGE?

What is WHERE used for in Cypher and how is it different from filters inside the MATCH pattern?

What is RETURN and how do you use aggregations?

What is WITH in Cypher and why is it important?

What is a label and how is it different from a property?

How do you create an index in Neo4j and why does it matter?

What is a constraint in Neo4j?

How do you delete nodes and relationships safely?

How do you import data from a CSV file into Neo4j?

Where is Neo4j used in industry?

How do you explain a Cypher query plan with EXPLAIN and PROFILE?

What is a cartesian product and how do you avoid it?

What is the difference between label scan and index scan?

What is the MERGE locking gotcha and how do you avoid it?

How do you write a variable-length path query and what are the dangers?

What is APOC and which procedures should every Neo4j developer know?

What is GDS (Graph Data Science) and when do you use it?

How do you model a recommendation engine in Neo4j?

How do you model fraud detection patterns in Neo4j?

How do you handle large transaction batching for bulk imports?

How do you query Neo4j from Python or Node.js applications?

When does Neo4j beat a relational database, and when does it lose?

How does Neo4j compare to RDF / SPARQL stores like GraphDB?

How do you architect Neo4j for high availability and what is the role of clustering in 5.x?

How do you tune query performance on a graph with billions of nodes?

How would you architect a real-time fraud detection system on Neo4j for a fintech like Razorpay?

How do vector indexes in Neo4j 5.x change RAG / LLM architectures?

How do you handle schema migration and zero-downtime deployments on a production Neo4j cluster?

Companies Hiring Neo4j

Salary Insights

Frequently Asked Questions