You may have noticed that while Nvidia keeps its gaming GPU roadmap under wraps and avoids talking about it ahead of launches, it does the opposite with AI server GPUs. Those are often unveiled up to a year before release—Grace CPUs were announced two years early. Now these two approaches may have converged. Despite Blackwell only having debuted this year, Nvidia has already announced the first GPU of the follow-up architecture, Rubin.
This first product of the Rubin generation, however, is not a pure AI GPU like the Hopper-generation H100 or the Blackwell-generation B200, which we’re used to seeing paper-launched a year early like this. Rubin will eventually introduce their successor (likely designated R200) in the form of a chiplet-based, very expensive GPU with 288 GB of HBM4 memory (on an 8192-bit bus). But Nvidia has now revealed something else, a GPU called Rubin CPX.
Rubin CPX: Next-gen gaming high-end in disguise?
While the large Rubin GPU made up of multiple chiplets (like today’s B200) will be used for AI training, Rubin CPX is aimed at inference—the tasks where a trained neural network (an AI model) is applied to perform work. Unlike the R200, this is a monolithic GPU made from a single die, not chiplets. And secondly: it will use GDDR7 memory. Both factors make it potentially much cheaper to manufacture—no need for a special substrate, silicon interposer, or advanced packaging. The GPU die and GDDR7 chips will simply sit next to each other on the PCB, just like a standard GPU.
In other words, Rubin CPX looks exactly like a conventional high-end GPU for graphics cards. And it’s very likely that it is in fact designed for both roles—gaming graphics cards and AI inference accelerators. The fact that Nvidia can use the same chip for both (and a simple one to produce at that) should make Rubin CPX a lot cheaper—or more likely, given today’s market bubble, further increase Nvidia’s profit margins from these accelerators.
Other than being a monolithic GPU paired with GDDR7, not much else is known yet. According to Nvidia, it can reach 30 PFLOPS of AI compute on tensor cores when using 4-bit precision (NVFP4), the figure is likely with sparsity enabled (otherwise it would be half that). That’s 50% more performance than the dual-chiplet Blackwell B200. Rubin’s architecture may have doubled the number of tensor cores, doubled the performance of each tensor core, or perhaps doubled the number of SMs—from which tensor cores and their throughput are derived. (That would make it an extremely “wide” GPU.) In the more likely first case, general-purpose and graphics performance would grow less than tensor core performance.

The GPU used in Rubin CPX is said to include four NVENC multimedia encoders and four NVDEC decoders, an increase over Blackwell gaming GPUs, which top out at three each (and not all are enabled in gaming models).
128 GB memory paired with a gaming GPU?
In its AI acceleration variant, Rubin CPX will carry 128 GB of memory, which is important for handling inference on large AI models. That may sound impossible for a gaming GPU, but in fact, it is fully achievable with ordinary GDDR7 technology and a 512-bit memory bus. For comparison, today’s GeForce RTX 5090 has a 32 GB memory configuration on its 512-bit bus, using 16 chips of 16 Gb (2 GB) each.
In so-called clamshell mode, 512-bit GDDR7 memory can be doubled (usually facilitated by mounting chips on both sides of the PCB). Even today’s GB202 in the GeForce RTX 5090 could use this route to use 64 GB. Memory makers should also be able to produce 32 Gb (4 GB) GDDR7 chips (since DDR5 DRAM chips of this capacity are in production), allowing a GPU with a 512-bit bus to hit exactly 128 GB. Alternatively, a 1024-bit bus with 16 Gb GDDR7 chips could be used (with a theoretical maximum of 192 GB using existing 24 Gb chips), but that’s the less likely option.
All of this supports the theory that the very same Rubin CPX chip Nvidia presented as an AI inference processor is also destined for gaming graphics—potentially to become the GeForce RTX 6090. The gaming version would almost certainly have less memory, perhaps 32 GB or 48 GB (64 GB with 16 × 32 Gb chips could be used in theory, though Nvidia probably won’t do that).

Nvidia also showed renders simulating a die shot, which some commentators believe match the layout of a GPU architecture with graphics units like ROPs. Analysis of this image suggests the GPU could have 16 GPC blocks. With 6 TPCs per GPC, that would give 192 SMs (like GB202), or 24,576 shaders (if an SM still contains 128 shaders). If each GPC contained 8 TPCs, it would be 256 SMs—32,768 shaders. In the gaming version, some units will certainly be disabled to allow harvesting of partially defective chips. It’s worth stressing, though, that the image may not correspond to reality at all—it could simply be an artificial concept image with no relation to an actual GPU design.
Nvidia hasn’t said which manufacturing process will be used. It will likely be TSMC’s 3 nm or 2 nm process. Even so, the GPU will probably again be right at the edge of feasibility in die size (600 to 800 mm²). Rubin CPX will be paired in Nvidia’s rack systems with the large Rubin (R200) compute GPU, dividing workloads between them. For us, of course, the gaming version under the GeForce brand will be the most interesting.

Release is still far away
At the start, we mentioned how preliminary this reveal is. Nvidia openly says that Rubin CPX will launch (or at least be officially announced, though not necessarily available) at the end of next year (2026).
This means the early reveal does not signal that Nvidia’s next-gen GPUs will arrive sooner than expected. Everything indicates the usual two-year cadence will hold, with GeForce RTX 6000 series cards based on this chip coming either at the end of 2026 (in the fall, as used to be typical for Nvidia), or early 2027 (as we saw with Blackwell). Of course, that’s assuming no delays negatively affecting the launch window.
Either way, it’s a very interesting reveal.
Sources: Nvidia (1, 2) Videocardz (1, 2), HardwareLuxx
English translation and edit by Jozef Dudáš
⠀






