Netflix Engineers Detail Graph Abstraction Layer Handling 650 TB of Graph Data at 10 Million Operations Per Second

Overview

Netflix engineers have published a detailed account of Graph Abstraction, an internal platform that manages approximately 650 TB of graph data across multiple regions while sustaining close to 10 million operations per second. The system delivers single-digit millisecond latency for single-hop traversals and under 50 milliseconds at the 90th percentile for two-hop queries, according to a three-part series on the Netflix Tech Blog and subsequent coverage by InfoQ.

The platform powers several production workloads including social graphs for Netflix Gaming, service topology graphs used during incident analysis, and a broader real-time distributed graph that captures dynamic relationships across the company’s infrastructure.

Why Netflix Built Its Own

Rather than adopting an off-the-shelf graph database such as Neo4j or AWS Neptune, Netflix engineers opted to build Graph Abstraction as a layer on top of existing internal infrastructure. The team cited scalability limitations, workload complexity, and operational overhead as reasons for rejecting traditional graph databases. The resulting system leverages KVDAL, Netflix’s Key-Value Data Abstraction Layer built on Apache Cassandra, which provides what the engineers described as a two-level map architecture suited to graph data patterns.

The design intentionally trades query flexibility for predictable performance. Graph Abstraction restricts traversal depth and typically requires a defined starting node, a constraint that enables consistent low latency at massive scale. This positions the system squarely in the OLTP domain, optimizing for millions of operations per second with millisecond response times rather than the open-ended analytical queries that characterize graph analytics workloads.

Architecture and Storage

The storage layer separates edge connections from edge properties using a two-tier strategy. A links index maintains adjacency lists mapping source nodes to target nodes, while a separate properties index stores edge attributes independently. This decoupling prevents the wide-row problem common in distributed databases when a single node accumulates many relationships, and it allows property updates without touching the adjacency structure.

Each node type and edge type receives its own namespace backed by Cassandra, enabling independent scaling and operational isolation. The entire graph spans approximately 27 namespaces distributed across around 12 Cassandra clusters running on roughly 2,400 EC2 instances. At its current scale, the system stores 8 billion nodes and more than 150 billion edges.

For caching, the platform integrates with EVCache, Netflix’s distributed in-memory caching layer, using a combination of write-aside caching for edge links and read-aside caching for properties. Consistency-critical graphs use invalidation-on-write, while frequently modified datasets rely on TTL-driven cache expiration.

Query Model

Graph Abstraction exposes a custom gRPC traversal API inspired by Apache TinkerPop’s Gremlin query language. The API supports chained traversals with configurable depth limits, property selection and filtering, sorting by last-write timestamp, and result pagination. Schemas are loaded into memory at startup with strict enforcement for validation and optimized traversal planning.

The system uses a strongly typed property graph model where nodes and edges carry explicit type information and properties. Edges can be defined as unidirectional or bidirectional, with direction-agnostic identifiers enabling bidirectional traversals in a single database call.

Consistency and Global Availability

Graph Abstraction implements what the team calls strict eventual consistency, using asynchronous replication across regions. Conflict resolution follows a last-write-wins strategy with timestamp-based idempotency tokens. A Kafka-based entropy repair mechanism handles failures in multi-namespace writes, ensuring that partially completed graph mutations are eventually reconciled across all replicas.

Node deletions are handled asynchronously to avoid synchronous cascading deletes that could spike latency. The system relies on last-write-wins semantics to maintain correctness during concurrent updates to the same graph region.

Broader Context

The Graph Abstraction disclosure arrives as graph technologies increasingly intersect with AI infrastructure. Techniques such as GraphRAG, which combines knowledge graphs with retrieval-augmented generation, have driven renewed interest in how organizations model and query connected data at scale. Netflix’s approach demonstrates that for high-throughput, latency-sensitive workloads, purpose-built graph layers on existing distributed storage can outperform general-purpose graph databases, though the trade-off is a narrower query model that may not suit every use case.