SK Hynix and SanDisk Launch Global Standardization of High Bandwidth Flash, a NAND-Based Alternative to HBM for AI Inference

Overview

SK hynix and SanDisk held a formal standardization kick-off event at SanDisk’s headquarters in Milpitas, California on February 25, marking the beginning of a global effort to establish High Bandwidth Flash (HBF) as a new memory category for the AI inference era. The initiative will operate under the Open Compute Project (OCP), with Samsung Electronics also signing a memorandum of understanding to participate in the consortium, according to Tom’s Hardware.

HBF is a NAND-based memory technology designed to sit between High Bandwidth Memory (HBM) and traditional solid-state drives in the storage hierarchy. By stacking NAND flash dies using advanced packaging techniques borrowed from HBM manufacturing, HBF aims to deliver comparable bandwidth to HBM while providing 8 to 16 times the capacity at a similar cost point, as Tom’s Hardware reported.

Why HBM Alone Falls Short

The push for HBF reflects a growing recognition that HBM, despite its dominance in AI accelerators, faces fundamental capacity constraints that will intensify as AI workloads evolve. Current HBM stacks use DRAM, which offers extremely fast access times but is limited in how many layers can be stacked before thermal and yield challenges become prohibitive.

As the AI industry shifts from training-centric to inference-heavy deployments, the memory requirements are changing. Large language model inference relies heavily on key-value (KV) caches that grow linearly with context length and batch size. At scale, these caches can consume hundreds of gigabytes of memory, far exceeding what HBM stacks alone can economically provide.

SK hynix addressed this challenge directly with its H3 architecture concept, presented in an IEEE paper in February 2026. The H3 design places both HBM and HBF on a single interposer alongside the GPU, creating a unified memory system. In simulations using an Nvidia Blackwell B200 GPU, the hybrid approach showed a 2.69-fold improvement in performance per watt compared to HBM-only configurations. At a KV cache depth of 10 million tokens, the possible batch size increased by a factor of 18.8, according to Tom’s Hardware.

How the Technology Works

HBF applies the vertical stacking and high-speed interconnect principles of HBM to NAND flash rather than DRAM. The result is a memory module that trades some of HBM’s access latency for dramatically greater storage density. In the H3 architecture, model weights and shared pre-computed KV caches are stored in HBF, while actively generated KV caches and other latency-sensitive data remain in HBM.

The approach is not without trade-offs. NAND flash has limited write endurance, approximately 100,000 cycles, making HBF unsuitable for write-intensive workloads. Its access latency is also higher than HBM’s, though SK hynix’s design incorporates latency-hiding buffers in the HBM base die to compensate. These characteristics make HBF a natural fit for read-heavy inference workloads where large volumes of static data must be accessible at high bandwidth.

Industry Alignment and Timeline

The standardization effort has drawn participation from the three largest memory manufacturers. Samsung signed a memorandum of understanding alongside SK hynix and SanDisk, signaling broad industry commitment to the technology. The collaboration builds on SanDisk’s experience in NAND design and system-on-chip development, combined with SK hynix’s expertise in advanced memory packaging.

SanDisk is targeting first HBF samples in the second half of 2026, with inference devices incorporating HBF expected to become available in early 2027. The technology is being designed for deployment across large data centers, small enterprises, and edge computing applications, Bloomberg reported.

Market Context

The timing of the HBF push coincides with an unprecedented shortage of HBM. SK hynix has reported that its entire 2026 production capacity is already spoken for, with no meaningful supply relief expected before 2027. The HBM market is projected to grow at a compound annual growth rate of 33 percent between 2025 and 2030.

HBF does not aim to replace HBM but to complement it by addressing a capacity gap that HBM alone cannot fill at reasonable cost. As inference workloads scale to longer context windows and larger batch sizes, the economic case for a high-capacity memory tier between HBM and SSDs strengthens. Whether the OCP standardization process can move fast enough to meet the 2027 device timeline will be a key factor in determining how quickly HBF reaches production deployments.