NVIDIA BlueField-4 Powers New Class of AI-Native Storage Infrastructure for the Next Frontier of AI

News Summary:
- NVIDIA BlueField-4 powers NVIDIA Inference Context Memory Storage Platform, a new kind of AI-native storage infrastructure designed for gigascale inference, to accelerate and scale agentic AI.
- The new storage processor platform is built for long-context-processing agentic AI systems with lightning-fast long- and short-term memory.
- Inference Context Memory Storage Platform extends AI agents’ long-term memory and enables high-bandwidth sharing of context across clusters of rack-scale AI systems — boosting tokens per seconds and power efficiency by up to 5x.
- Enabled by NVIDIA Spectrum-X Ethernet, extended context memory for multi-turn AI agents improves responsiveness, increases throughput per GPU and supports efficient scaling of agentic inference.
CES—NVIDIA today announced that the NVIDIA BlueField®-4 data processor, part of the full-stack NVIDIA BlueField platform, powers NVIDIA Inference Context Memory Storage Platform, a new class of AI-native storage infrastructure for the next frontier of AI.
As AI models scale to trillions of parameters and multistep reasoning, they generate vast amounts of context data — represented by a key-value (KV) cache, critical for accuracy, user experience and continuity.
A KV cache cannot be stored on GPUs long term, as this would create a bottleneck for real-time inference in multi-agent systems. AI-native applications require a new kind of scalable infrastructure to store and share this data.
NVIDIA Inference Context Memory Storage Platform provides the infrastructure for context memory by extending GPU memory capacity, enabling high-speed sharing across nodes, boosting tokens per seconds by up to 5x and delivering up to 5x greater power efficiency compared with traditional storage.
“AI is revolutionizing the entire computing stack — and now, storage,” said Jensen Huang, founder and CEO of NVIDIA. “AI is no longer about one-shot chatbots but intelligent collaborators that understand the physical world, reason over long horizons, stay grounded in facts, use tools to do real work, and retain both short- and long-term memory. With BlueField-4, NVIDIA and our software and hardware partners are reinventing the storage stack for the next frontier of AI.”
NVIDIA Inference Context Memory Storage Platform boosts KV cache capacity and accelerates the sharing of context across clusters of rack-scale AI systems, while persistent context for multi-turn AI agents improves responsiveness, increases AI factory throughput and supports efficient scaling of long-context, multi-agent inference.
Key capabilities of the NVIDIA BlueField-4-powered platform include:
- NVIDIA Rubin cluster-level KV cache capacity, delivering the scale and efficiency required for long-context, multi-turn agentic inference.
- Up to 5x greater power efficiency than traditional storage.
- Smart, accelerated sharing of KV cache across AI nodes, enabled by the NVIDIA DOCA™ framework and tightly integrated with the NVIDIA NIXL library and NVIDIA Dynamo software to maximize tokens per second, reduce time to first token and improve multi-turn responsiveness.
- Hardware-accelerated KV cache placement managed by NVIDIA BlueField-4 eliminates metadata overhead, reduces data movement and ensures secure, isolated access from the GPU nodes.
- Efficient data sharing and retrieval enabled by NVIDIA Spectrum-X™ Ethernet serves as the high-performance network fabric for RDMA-based access to AI-native KV cache.
Storage innovators including AIC, Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data and WEKA are among the first building next-generation AI storage platforms with BlueField-4, which will be available in the second half of 2026.
Learn more by watching NVIDIA Live at CES.



