Nvidia Blackwell adds IPC instead of cores. Up to 70% faster?

New rumors of Nvidia's next-gen graphics cards with Blackwell architecture

The waiting for the next generation of Nvidia graphics cards will probably take a bit longer this time around, as Nvidia plans them for 2025 instead of fall 2024 according to the official roadmap. Still, information is starting to emerge about these upcoming GPUs that will bring the Blackwell architecture to both compute and gaming segment. And there are even first projections estimating the performance uplift over current GeForce GPUs.

Rumors about Nvidia’s next-gen GPU have emerged simultaneously from two sources. The first is Kopite7kimi, a leaker on Twitter who has been the most reliable source on Nvidia’s plans for the past few years (as shown by leaking both the RTX 3000 and RTX 4000 generations well in advance). Kopite7kimi came up with two pieces of information on the Blackwell architecture yesterday.

Same structure, more performance per unit?

The first is that this GPU would not necessarily increase the number of compute units in the individual chips of the entire GPU setup, or at least not in any dramatic way. In fact, the Blackwell architecture will reportedly focus on architectural changes and improvements at the level of a single “core” or a compute unit, perhaps one could say the “IPC” of the GPU units. This means that hypothetically, for example, the number of shaders in a particular GPU or card SKU would not increase, but instead the performance per unit would increase.

More specifically, Kopite7kimi talks about Blackwell not “significantly” increasing the number of units such as GPC or TPC blocks. These are the building blocks of Nvidia GPUs and the current performance-leading AD102 chip for example physically contains 12 GPC (Graphics Processing Cluster) blocks and in each of them there are six TPC (Texture Processing Clusters) subunits, which are further divided into two SM blocks (which contain 128 shaders each in the Ampere and Ada Lovelace architectures).

If this leak is legit, then these basic specs probably wouldn’t change much for the Blackwell GPUs, and the new top-performing GPU, which should be labeled GB202, may again be made up of 12 GPCs with six TPCs each. However, it is far from impossible that there could be more shaders per one SM block – perhaps 192 instead of 128 (more on that in a moment). In that case, the GPU would still carry 1.5 times as many shaders (27,648) despite sticking with 144 SM.

AD102 chip schematic, fully enabled configuration (source: Nvidia)

For reference, the other generation Ada Lovelace GPUs: the less powerful AD103 with a 256-bit memory bus officially contains seven GPCs with six TPCs each, but only 80 SM blocks in total (the configuration is not 100% symmetric). The 192-bit AD104 chip is composed of six GPCs with five TPCs each, for a total of 60 SM blocks. So it’s possible that the Blackwell generation will copy this too, but it has to be said that a GB204 chip is reportedly not going to exist – but its specs might be adopted by the next in the line, the GB206.

This chip configuration policy in terms of GPC and TPC counts wouldn’t be entirely surprising from Nvidia. The Ada generation, for example, is quite reminiscent of the Ampere chips, only each configuration has moved down one level (and the memory bus has been narrowed). For example, the AD103 has the same number of units as the previous-generation GA102, and the AD104 resembles the GA103.

It’s probably also worth emphasizing that Kopite7kimi isn’t saying that the numbers of units will stay completely unchanged, only that they won’t change much. So it is not at all impossible that there will be variations (and thus advances in numbers). It’s also entirely possible that the compute unit counts may (roughly) match, but the cache capacities or memory bus widths will change significantly. The most powerful GB202 could reportedly already have a 512-bit memory bus.

Multi-chip(let) GPUs for the first time

The second news from Kopite7kimi concerns the compute branch for servers, the GB100 chip. It will be designed for AI acceleration tasks and may not have graphics and raytracing blocks, it’s vbasicalyl the successor to the computing Hopper GPU (Nvidia’s roadmap seems to alternate between phases where gaming and compute GPUs have different architectures and phases where it’s shared between them). According to Kopite7kimi, the GB100 of the Blackwell architecture will use chiplets for the first time – it will consist of multiple pieces of silicon. Or rather, it’s not clear if any advanced packaging (chiplet interconnect technology) is used. Theoretically, it could be just a matter of combining two separate standalone GPU tiles on one package, in which case it would likely be more fitting to call it a multi-chip module (MCM). Note that this change of approach doesn’t seem to be planned for gaming Blackwell GPUs yet, it seems that those will still monolithic.

Dual-chip compute GPUs: AMD Instinct MI200 in OAM version

GeForce RTX 5090 performance projection?

And that’s it from Kopite7kimi for now. Besides him, there’s new leak by another source, namely the Chinese leaker Panzerlied. This person has revealed information that the Blackwell / RTX 5000 generation will be missing the GB204 GPU some time ago which was then confirmed by Kopite7kimi, so it may be worthwile to give an ear to this source as well, although his statements should probably be taken with a bigger grain of salt.

According to Panzerlied, the GeForce RTX 5090 could have roughly the following increases compared to the GeForce RTX 4090: memory bandwidth is said to increase by 52% (ending up 1.54 TB/s?), GPU clock speed is said to increase by about 15% (which would give a boost clock somewhere around 2.9 GHz?), cache capacity by 78% (128MB L2 cache?) and “scale” is said to be 50% higher, by which Panzerlied could possibly mean the number of compute units. If what Kopite7kimi says about GPC and TPC counts staying is true, then maybe there could be a 50% increase in the number of shaders present in a single SM (or maybe three SMs per TPC block?), to get this +50% scaling.

Overall, the GeForce RTX 5090 is said to have 70% better performance (again, this probably only a rough ballpark). The question is of course where this data is coming from. It’s not certain that it’s more than just speculation. Nvidia could theoretically be communicating something like this internally to partner AIB manufacturers who are gearing up to produce these cards, or to gaming PC manufacturers with whom it wants to close contracts in advance to supply these GPUs – of course, this would be under NDA. But then such numbers would only be preliminary, there might be some leeway in them, so the cards might end up even better. And of course there is also the risk that something goes wrong and the product ends up slightly weaker compared to the internal predictions. However, generally with GPUs you can “fix” a lot via increasing TDP as a plan B, so in the end, performance projections may come out to be accurate, but not the power efficiency goals.

But as has already been said, you should take this with a large grain of salt. If the cards aren’t due to go on sale until 2025, they’re still more than a year away and it’s way too early for determining their final performance. Nvidia can even move their own internal performance targets in various ways depending on the market situation. So that 70% figure is probably anything but set in stone.

Sources: Kopite7kimi (1, 2), Chiphell, VideoCardz (1, 2)

English translation and edit by Jozef Dudáš

Flattr this!

Leave a Reply

Your email address will not be published. Required fields are marked *