Neo4j is a Java-based, ACID-compliant graph database that utilizes the property graph model, originally introduced by Emil Eifrem in 2007. Unlike traditional relational databases that rely on normalized tables and complex joins, Neo4j models data as a network of interconnected entities. This approach aligns more naturally with human intuision—for instance, visualizing how a user interacts with content on a platform, how a transaction links multiple parties, or how a supply chain connects various nodes.
Data is represented using nodes (entities), relationships (connections between entities), and properties (key-value metadata attached to both nodes and relationships). At the storage layer, Neo4j operates as a true native graph database, ensuring that the graph model is reflected directly in how data is persisted.
Querying with Cypher
Neo4j uses Cypher, a declarative query language similar to SQL but optimized for graph structures. Nodes are denoted by parentheses (), relationships by brackets [] with arrows indicating direction.
The quickest way to experiment is via Neo4j Aura, a fully managed cloud service, though self-hosting with Docker is also an option.
To define a node, use the CREATE clause. The following example creates a User node with specific attributes:
CREATE (u:User { username: 'alice_dev', signupDate: '2023-10-01' })
RETURN u
To establish a connection, such as a user following another, you define the relationship within the query:
MATCH (a:User { username: 'alice_dev' }), (b:User { username: 'bob_coder' })
CREATE (a)-[:FOLLOWS]->(b)
This structure eliminates the need for foreign keys or join tables. You can enforce data integrity using constraints, such as ensuring unique usernames:
CREATE CONSTRAINT FOR (u:User) REQUIRE u.username IS UNIQUE
Modeling a Social Feed
Expanding the model to include posts, you can link Post nodes to User nodes. To retrieve a feed of posts from followed users published with in the last day:
MATCH (viewer:User)-[:FOLLOWS]->(author:User)-[:PUBLISHED]->(p:Post)
WHERE viewer.username = 'alice_dev' AND p.createdAt > datetime() - duration({hours: 24})
RETURN author.name, p.content
ORDER BY p.createdAt DESC
Cypher also supports complex pattern matching, such as finding users who liked a post but aren't muted by the viewer:
MATCH (viewer:User)-[:FOLLOWS]->(author:User)-[:PUBLISHED]->(p:Post)<-[:LIKED]-(liker:User)
WHERE viewer.username = 'alice_dev' AND NOT (viewer)-[:MUTED]->(liker)
RETURN liker.name, COUNT(p) AS likeCount
Core Concepts
The Property Graph
- Nodes: Entities like
:Product,:Location, or:User. - Relationships: Directed connections between nodes (e.g.,
:PURCHASED,:LOCATED_IN). - Properties: Descriptive data stored on nodes or relationships.
Labels and Types
- Labels: Categorize nodes (e.g.,
:Employee). - Relationship Types: Define the nature of the connection (e.g.,
:REPORTS_TO).
Cypher Syntax Examples
Finding Specific Connections:
MATCH (seek:Person { name: 'Charlie' })-[:FOLLOWS]->(friend:Person)
RETURN friend.name, friend.age
Updating Data:
MATCH (p:Person { name: 'Charlie' })
SET p.role = 'Senior Engineer', p.department = 'R&D'
RETURN p
Aggregation and Sorting:
MATCH (author:User)-[:POSTED]->(t:Tweet)
WHERE t.timestamp > timestamp() - 86400000
MATCH (t)<-[:LIKED]-(liker:User)
RETURN liker.username, COUNT(t) AS totalLikes
ORDER BY totalLikes DESC
LIMIT 10
Performance and Optimization
Indexing
To speed up lookups on specific properties, create an index:
CREATE INDEX user_email_idx FOR (u:User) ON (u.email)
Query Analysis
Use EXPLAIN to see the execution plan or PROFILE to see runtime statistics.
Advanced Capabilities
Full-Text Search
Integrate with Lucene for text-heavy searches:
CALL db.index.fulltext.createNodeIndex('postSearch', ['Post'], ['title', 'body'])
Graph Data Science
Leverage algorithms like PageRank via the Neo4j Graph Data Science library:
CALL gds.pageRank.stream('socialGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
Use Cases
- Recommendation Engines: Suggesting products based on user behavior and item similarity.
- Fraud Detection: Identifying complex rings of collusion in transaction networks.
- Knowledge Graphs: Powering AI systems with structured, queryable knowledge.
- Social Networks: Managing friends, followers, and activity streams.
Integration
- Spring Data Neo4j: Simplifies integration for Java applications.
- Neo4j-GraphQL: Allows querying the graph using GraphQL schemas.