New GPUs of the Blackwell / RTX 5000 generation
Nvidia’s new graphics cards – the GeForce RTX 5090 and RTX 5080 – won’t be out until the 30th, but NDA is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, its new features and functions. DLSS 4, compute unit architecture and features of the GPUs as well as the software side of this new generation.
The RTX 5000/Blackwell generation GPUs are a full new architecture compared to the previous generation 4000 with the Ada Lovelace architecture. Virtually all components of the GPU have been changed or updated to the newer version of the IP. With one exception – the GPUs are still manufactured using the same process node as the Ada Lovelace architecture GPUs: TSMC’s 4N technology, which is a version of the N5 process node with custom modifications for Nvidia’s needs. This is the difference against the compute version of Blackwell (B200/GB200 accelerator) for servers, where Nvidia used a process node called 4NP, which adds some further tuning o top of 4N.
GB202
The most powerful chip in the Blackwell generation is the GB202 with 92.2 billion transistors and a die area of 750 mm² which contains 192 SM (SM = Streaming Multiprocessor) blocks, adding up to 24,576 shaders. The SMs are distributed in 96 TPC (Texture Processing Cluster) blocks of two SMs each. There are still RT cores (one per SM) and Tensor Cores (four per SM) present in each SM block. Thus, the GB202 has 192 RT cores and 768 Tensor Cores.
At the TPC block level, in addition to the two SMs, there are also eight texture units – the GPU has 768 of them in total. In real-world configurations, some of them will be disabled, the number of units in specific graphics cards SKUs depends on the number of active TPCs.

The TPCs are in turn combined into 12 GPC (Graphics Processing Cluster) blocks, where one GPC contains 8 TPCs (and thus 16 SMs). At the GPC level, 16 ROP units per GPC block (two Raster operation partitions of 8 ROPs each) are integrated. The entire GB202 GPU contains 192 ROPs, but when a GPC block is disabled, the GPU will lose its units, so for example the RTX 5090 should have only 176 ROPs (as it has 11 active GPCs, with 170 SMs).
GDDR7
Blackwell GPUs are the first to use GDDR7 memory. In the case of the GB202 chip, it’s even in addition to using a 512-bit memory bus (the first time since the Fermi generation Nvidia used such). The memory controllers are still 32-bit wide, so there are 16 of them in parallel in the GB202 (and a corresponding number in lower-end GPUs with narrower memory buses). In the GeForce RTX 5090 with the GB202 chip, GDDR7 runs at an effective clock speed of 28.0 GHz, and it’s likely to be similar in most models. RTX 5080 is an exception, however, running the memory at an effective clock speed of 30.0 GHz.

GDDR7 uses PAM3 pulse-amplitude signalling, which transfers 1.5 bits per cycle. At first glance, this may seem like a step backwards compared to PAM4 (2 bits per cycle) that was used in GDDR6X, but the simpler signalling along with perhaps more fine-tuned technology seems to allow GDDR7 communications to have significantly better signal-to-noise ratio at the same clock speed, so while it can transfer 25% less data per cycle, it can be clocked much higher, so the final “effective clock speed” (effective transfer rate in Gbps per bit width) of the memory is that much higher than GDDR6X. Even the power efficiency should be better, according to Nvidia.
- Read more: Ampere GPU: new PAM4-based GDDR6X memory & more details
- Read more: GDDR7 memory for next-gen GPUs is ready, up to 48GHz clocks
L2 cache
In addition, Blackwell GPUs also have a relatively large L2 cache, which can play a comparable role to the Infinity Cache (L3 cache) in AMD GPUs – Blackwell does not have an L3 cache, L2 is the last level in the hierarchy before the memory itself. The L2 cache capacities seem to be unchanged in Blackwell generation GPUs compared to corresponding Ada Lovelace (RTX 4000) generation chips, except in the case of the GB202. This GPU has 128 MB of L2 cache versus 96 MB in its predecessor, the AD102.
However, it appears that a good portion of this generous L2 cache capacity will be disabled on the GeForce RTX 5090, with only 96MB of it active in this gaming model. Only some future server or workstation SKUs based on the GB202 chip will probably feature the cache fully enabled. A similar thing happened with the RTX 4090.

The smaller GPU in the line: the GB203
The GB203 chip, which will be featured in the GeForce RTX 5080 and 5070 Ti, is just 378 mm² in size and is said to contain 45.6 billion transistors. Interestingly, this is slightly less than in the last-generation AD103 chip (45.9 billion), which was also a hair bigger (378.6 mm²). From this, it seems that in the Blackwell generation, Nvidia has managed, with more or less the same 4N manufacturing technology from TSMC and the same transistor density, to squeeze in some extra new technology and more performance per unit area – unless the performance increase in the GeForce RTX 5080 against the RTX 4080 is just due to the increase in power consumption from 320 to 360 W (and thus clock speeds), which remains to be seen. But the Blackwell architecture itself should deliver slightly better performance at a given clock speed, so the fact that it doesn’t need much more on-chip space is notable.
This GPU consists of 7 GPC blocks, 42 TPC blocks and 84 SM blocks. Thus, it has a total of 10,752 shaders, 84 RT cores, 336 texture units and 336 Tensor Cores. The GPU contains 64 MB of L2 cache, just like the previous AD103 in the GeForce RTX 4080.
The count of 7 GPCs indicates a number of 112 ROP units. This GPU has only a 256-bit memory bus. Nvidia skipped a 384-bit bus Blackwell GPU configuration. As a result the memory width (and capacity as well) of the GeForce RTX 5080 will be half the size. This will be only slightly compensated by higher clock speeds, as GDDR7 will run at 30.0 GHz effectively on this model. But it’s true that the number of compute units is even less than half that of the GB202, so it’s not out of balance.

The GB205 for cheaper cards
The third chip in the series is the GB205, since a GB204 die does not exist and the replacement for the previous AD104 is the GB205 design. The die area of this GPU, which according to Nvidia consists of 31.1 billion transistors, is 263 mm², significantly less than that of the AD104 chip (294.5 mm² with 35.8 billion transistors), so Nvidia would have a higher margin if they RTX 5070 cards (which will use this GPU) were to replace the RTX 4070 in the market at the same price. Alternatively this allows RTX 5070 to be priced lower than RTX 4070 was.
In this case, however, the smaller die area is due to the fact that the GB205 has weaker parameters. While the AD104 contains 60 SM blocks, the GB205 chip has only 50 SMs (5 GPCs, 25 TPCs), which, in a full configuration, is 6400 shaders, 50 RT cores and 200 Tensor Cores – but the RTX 5070 will use a stripped-down configuration with only 6144 shaders, which enables harvesting of some chips with manufacturing defects for better yield utilization.
Like the AD104, the chip has a 192-bit memory bus, but it can and will use GDDR7 memory just like its higher-end siblings. The L2 cache capacity is 48MB like that of the AD104 and the GPU also has the same 80 ROPs as AD104.
The article continues on the next page.
⠀⠀⠀
- Contents
- New GPUs of the Blackwell / RTX 5000 generation
- The Blackwell architecture
- New technologies in Blackwell chips
- Software and new equipment