News Daily Nation Digital News & Media Platform

collapse
Home / Daily News Analysis / Details of Intel's HBM-killer memory tech emerge, revealing nine layers, up to 9GB of DRAM capacity, and almost as much bandwidth as HBM4 that powers Nvidia's Vera Rubin AI platform

Details of Intel's HBM-killer memory tech emerge, revealing nine layers, up to 9GB of DRAM capacity, and almost as much bandwidth as HBM4 that powers Nvidia's Vera Rubin AI platform

May 14, 2026  Twila Rosenbaum  7 views
Details of Intel's HBM-killer memory tech emerge, revealing nine layers, up to 9GB of DRAM capacity, and almost as much bandwidth as HBM4 that powers Nvidia's Vera Rubin AI platform

Intel has provided new details of its ambitious memory technology aimed at displacing High Bandwidth Memory (HBM) in AI and high-performance computing workloads. The memory stack, which Intel refers to as a 'HBM-killer,' is designed to offer similar or superior bandwidth while using a different architectural approach that could reduce cost and complexity. According to leaked documents and internal presentations, the technology consists of nine vertically stacked DRAM layers, providing up to 9GB of total capacity per stack.

The bandwidth is reported to be almost on par with HBM4, the upcoming generation of HBM that will power Nvidia's Vera Rubin AI platform. HBM4 is expected to deliver data rates exceeding 6.4 Gbps per pin and aggregate bandwidth beyond 2 TB/s per stack. Intel's solution aims to achieve roughly equivalent performance, placing it in direct competition with the memory technology that has become the standard for AI accelerators.

Architecture and Layers

The nine-layer design is a significant departure from typical HBM stacks, which currently use eight or twelve layers in HBM2E and HBM3. By optimizing the die-to-die interconnects and using advanced through-silicon vias (TSVs), Intel has managed to reduce the overall thickness of the stack while increasing the number of layers. Each layer is a custom DRAM chip fabricated using Intel's own process technology, allowing tighter integration with their processors and accelerators.

Capacity is capped at 9GB per stack, which is lower than the 24GB or 36GB available in HBM3 stacks. However, Intel's approach emphasizes bandwidth and power efficiency over raw capacity. The target applications are AI inference and training, where latency and memory bandwidth are the primary bottlenecks. For workloads that require larger memory pools, Intel plans to support multiple stacks connected via their EMIB (Embedded Multi-die Interconnect Bridge) technology.

Bandwidth Comparisons to HBM4

Intel claims that their memory can achieve data transfer rates of up to 8 Gbps per pin, which is close to the 9.2 Gbps expected for HBM4. With a 1024-bit interface per stack, this yields an aggregate bandwidth of roughly 1 TB/s per stack. While HBM4 targets 2 TB/s per stack, Intel's technology may compensate by offering lower latency and better system-level efficiency. The exact numbers are still under wraps, but early benchmarks suggest that the gap is small enough to make Intel's solution a viable alternative in many AI workloads.

The Vera Rubin AI platform from Nvidia, codenamed after the astronomer, will rely on HBM4 to achieve the extreme performance required for training large language models and other generative AI tasks. By matching or approaching that bandwidth, Intel's memory could be used in competing AI accelerators, such as their upcoming Falcon Shores GPU or other custom ASICs.

Background and Context

High Bandwidth Memory has been the gold standard for GPUs and accelerators since its introduction by AMD and Hynix in 2013. HBM stacks multiple DRAM dies on a silicon interposer, providing wide buses and high bandwidth while saving space and power compared to GDDR. However, HBM is expensive due to the complex 2.5D packaging and the use of an interposer. Intel's technology attempts to bypass the interposer by integrating the memory directly onto the processor package using its Foveros 3D stacking technology. This could reduce cost and improve power efficiency.

Intel has a long history of memory innovation, from the invention of the DRAM in the 1970s to the development of Optane (3D XPoint) in partnership with Micron. While Optane failed to gain traction in the mainstream, it demonstrated Intel's willingness to challenge established memory hierarchies. The new HBM-killer represents a return to that disruptive spirit, targeting the high-margin AI accelerator market that is currently dominated by Nvidia and its HBM supply chain.

Intel's memory technology is still in the final stages of validation. Early engineering samples have been tested in Intel's labs and are showing promising results. The company expects to begin production in late 2025 or early 2026, with initial deployments in their upcoming server processors and discrete GPUs. If successful, it could break Nvidia's reliance on Samsung and SK Hynix for HBM, giving Intel a competitive advantage in the AI hardware space.

Implications for the AI Ecosystem

The emergence of Intel's HBM-killer could have far-reaching consequences. AI model developers currently face high memory costs, which contribute to the skyrocketing prices of AI hardware. A cheaper alternative with comparable performance would lower barriers to entry for smaller companies and research institutions. It could also spur innovation in memory hierarchy, leading to more efficient systems that combine HBM-like bandwidth with large pools of slower memory (e.g., CXL-attached DRAM).

Nvidia's dominance in AI is not just due to its compute units but also its memory subsystem. If Intel can offer a memory solution that is as good or better, it could level the playing field. Additionally, Intel's technology is likely to be open to other chip makers through its foundry services, potentially creating a new ecosystem of third-party accelerators that use Intel memory.

However, challenges remain. HBM4 is already being developed by memory giants Samsung and SK Hynix, with aggressive roadmaps for higher bandwidth and capacity. Intel will need to demonstrate that its solution can scale to future generations and maintain compatibility with existing software stacks. The nine-layer design may also pose yield challenges, as stacking more dies increases the risk of defects. Intel's experience in multi-die packaging with EMIB and Foveros gives it an edge, but the volume production of such stacks is untested.

Another factor is the software ecosystem. HBM has mature support in frameworks like CUDA and ROCm. Intel must ensure its memory works seamlessly with its own oneAPI and other AI software tools. The company plans to provide a memory abstraction layer that makes the new memory appear as standard HBM to applications, simplifying adoption.

Technical Deep Dive: How It Works

Intel's memory stack uses a novel interface called MDIO (Multi-Die I/O) that operates at extremely low voltage swing (0.4V) to save power. Each layer communicates through micro-bumps and TSVs, with data flowing through the stack in a daisy-chain topology. The controller is integrated into the base die, which also handles error correction and refresh. The entire stack is built using Intel's 4nm process for the base die and a mature 1z nm DRAM process for memory layers.

The bandwidth-per-watt figure is reported to be 25% better than HBM3, thanks to the lower voltage and shorter signal paths. Intel claims that for AI inference tasks, the memory can deliver up to 3x the performance per watt compared to GDDR6X. This makes it ideal for power-constrained data centers and edge AI devices.

Intel has also developed a new protocol for memory access that reduces latency by interleaving data across layers. The protocol is similar to HBM's pseudo-channel architecture but optimized for the nine-layer stack. Early simulations show that for random access patterns common in graph neural networks, latency is 15% lower than HBM3.

The capacity of 9GB per stack is intentional. AI models like GPT-3 require massive amounts of memory (175 billion parameters), but Intel believes that most inference workloads can fit within 9GB per accelerator with proper model pruning and quantization. For training, multiple stacks can be clustered to reach 72GB or more, using Intel's advanced packaging to connect stacks via a shared interposer or silicon bridge.

Intel has also patented a scheme for in-memory computing that could allow limited processing within the memory stack, further reducing data movement. This is a long-term feature that may appear in later generations.

Competitive Landscape

The HBM market is currently dominated by Samsung and SK Hynix, with Micron also developing HBM3E. HBM4 is being co-developed with Nvidia and is expected to appear in 2026. Intel's technology, if successful, could capture a significant share of the accelerator memory market, which is projected to reach $30 billion by 2028. However, Intel faces stiff competition from its own customers: both AMD and Nvidia are heavily invested in HBM, and they may be reluctant to switch to a proprietary Intel solution. Intel must position its memory as an open standard that can be licensed to other DRAM manufacturers to gain traction.

Another potential competitor is CXL (Compute Express Link) memory, which offers high bandwidth but higher latency. Intel's approach is more directly comparable to HBM, focusing on ultra-low latency and high bandwidth for AI. The company is confident that its packaging expertise gives it a unique advantage. Intel's recent success with Ponte Vecchio and Sapphire Rapids has demonstrated its capability in complex multi-die designs.

Despite the promise, Intel's historical struggles in the GPU and AI accelerator market are well-known. The new memory technology may not be enough to overcome Nvidia's software moat if Intel's own accelerators fail to gain adoption. However, Intel could sell the memory to other AI chip startups, such as Cerebras or Groq, which are looking for alternatives to HBM.

In summary, Intel's HBM-killer memory represents a bold attempt to disrupt the AI memory landscape. With nine layers, 9GB capacity, and bandwidth nearly matching HBM4, it has the potential to become a cornerstone of next-generation AI hardware. The company is targeting a launch in 2026, aligning with the expected arrival of HBM4 in Nvidia's Vera Rubin platform. The success of this technology will depend on Intel's ability to produce it at scale, attract ecosystem partners, and prove its performance advantage in real-world AI workloads.


Source: TechRadar News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy