{"id":219273,"date":"2025-01-28T21:10:49","date_gmt":"2025-01-28T20:10:49","guid":{"rendered":"https:\/\/www.hwcooling.net\/?p=219273\/"},"modified":"2025-01-28T22:55:34","modified_gmt":"2025-01-28T21:55:34","slug":"blackwell-geforce-rtx-5000-architecture-and-innovations-analysis","status":"publish","type":"post","link":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/","title":{"rendered":"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis]"},"content":{"rendered":"<p><!--nextpage-->Nvidia&#8217;s new graphics cards \u2013 the GeForce RTX 5090 and RTX 5080 \u2013 won&#8217;t be out until the 30th, but NDA is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, its new features and functions. DLSS 4, compute unit architecture and features of the GPUs as well as the software side of this new generation.<!--more--><\/p>\n<p>The RTX 5000\/Blackwell generation GPUs are a full new architecture compared to the previous generation 4000 with the Ada Lovelace architecture. Virtually all components of the GPU have been changed or updated to the newer version of the IP. With one exception \u2013 the GPUs are still manufactured using the same process node as the Ada Lovelace architecture GPUs: TSMC&#8217;s 4N technology, which is a version of the N5 process node with custom modifications for Nvidia&#8217;s needs. This is the difference against the compute version of Blackwell (<a href=\"https:\/\/www.hwcooling.net\/nova-generace-ai-gpu-od-nvidie-odhalena-b200-blackwell\/\">B200\/GB200 accelerator<\/a>) for servers, where Nvidia used a process node called 4NP, which adds some further tuning o top of 4N.<\/p>\n<h3 id=\"h41\" class=\"western\">GB202<\/h3>\n<p>The most powerful chip in the Blackwell generation is the GB202 with 92.2 billion transistors and a die area of 750 mm\u00b2 which contains 192 SM (SM = Streaming Multiprocessor) blocks, adding up to 24,576 shaders. The SMs are distributed in 96 TPC (Texture Processing Cluster) blocks of two SMs each. There are still RT cores (one per SM) and Tensor Cores (four per SM) present in each SM block. Thus, the GB202 has 192 RT cores and 768 Tensor Cores.<\/p>\n<p>At the TPC block level, in addition to the two SMs, there are also eight texture units \u2013 the GPU has 768 of them in total. In real-world configurations, some of them will be disabled, the number of units in specific graphics cards SKUs depends on the number of active TPCs.<\/p>\n<figure id=\"attachment_219208\" aria-describedby=\"caption-attachment-219208\" style=\"width: 2048px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Sch\u00e9ma-GPU-Nvidia-GB202.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219208\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Sch\u00e9ma-GPU-Nvidia-GB202.png\" alt=\"\" width=\"2048\" height=\"687\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Sch\u00e9ma-GPU-Nvidia-GB202.png 2048w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Sch\u00e9ma-GPU-Nvidia-GB202-300x101.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Sch\u00e9ma-GPU-Nvidia-GB202-768x258.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Sch\u00e9ma-GPU-Nvidia-GB202-1024x344.png 1024w\" sizes=\"auto, (max-width: 2048px) 100vw, 2048px\" \/><\/a><figcaption id=\"caption-attachment-219208\" class=\"wp-caption-text\">Schematic of the Nvidia GB202 GPU<\/figcaption><\/figure>\n<p>The TPCs are in turn combined into 12 GPC (Graphics Processing Cluster) blocks, where one GPC contains 8 TPCs (and thus 16 SMs). At the GPC level, 16 ROP units per GPC block (two Raster operation partitions of 8 ROPs each) are integrated. The entire GB202 GPU contains 192 ROPs, but when a GPC block is disabled, the GPU will lose its units, so for example the RTX 5090 should have only 176 ROPs (as it has 11 active GPCs, with 170 SMs).<\/p>\n<h4 id=\"h42\" class=\"western\">GDDR7<\/h4>\n<p>Blackwell GPUs are the first to use GDDR7 memory. In the case of the GB202 chip, it&#8217;s even in addition to using a 512-bit memory bus (the first time since the Fermi generation Nvidia used such). The memory controllers are still 32-bit wide, so there are 16 of them in parallel in the GB202 (and a corresponding number in lower-end GPUs with narrower memory buses). In the GeForce RTX 5090 with the GB202 chip, GDDR7 runs at an effective clock speed of 28.0 GHz, and it&#8217;s likely to be similar in most models. RTX 5080 is an exception, however, running the memory at an effective clock speed of 30.0 GHz.<\/p>\n<figure id=\"attachment_219206\" aria-describedby=\"caption-attachment-219206\" style=\"width: 2048px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/GDDR7-v-GPU-Blackwell.png\"><img loading=\"lazy\" decoding=\"async\" class=\"noborder wp-image-219206 size-full\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/GDDR7-v-GPU-Blackwell.png\" alt=\"\" width=\"2048\" height=\"882\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/GDDR7-v-GPU-Blackwell.png 2048w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/GDDR7-v-GPU-Blackwell-300x129.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/GDDR7-v-GPU-Blackwell-768x331.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/GDDR7-v-GPU-Blackwell-1024x441.png 1024w\" sizes=\"auto, (max-width: 2048px) 100vw, 2048px\" \/><\/a><figcaption id=\"caption-attachment-219206\" class=\"wp-caption-text\">GDDR7 in Blackwell GPUs<\/figcaption><\/figure>\n<p>GDDR7 uses PAM3 pulse-amplitude signalling, which transfers 1.5 bits per cycle. At first glance, this may seem like a step backwards compared to PAM4 (2 bits per cycle) that was used in GDDR6X, but the simpler signalling along with perhaps more fine-tuned technology seems to allow GDDR7 communications to have significantly better signal-to-noise ratio at the same clock speed, so while it can transfer 25% less data per cycle, it can be clocked much higher, so the final &#8220;effective clock speed&#8221; (effective transfer rate in Gbps per bit width) of the memory is that much higher than GDDR6X. Even the power efficiency should be better, according to Nvidia.<\/p>\n<ul>\n<li><strong>Read more: <a href=\"https:\/\/www.hwcooling.net\/en\/ampere-gpus-use-new-gddr6x-memory-based-on-pam4-en\/\" rel=\"bookmark\">Ampere GPU: new PAM4-based GDDR6X memory &amp; more details<\/a><\/strong><\/li>\n<li><strong>Read more: <a href=\"https:\/\/www.hwcooling.net\/en\/gddr7-memory-for-next-gen-gpus-is-ready-up-to-48ghz-clocks\/\" rel=\"bookmark\">GDDR7 memory for next-gen GPUs is ready, up to 48GHz clocks<\/a><\/strong><\/li>\n<\/ul>\n<h3 id=\"h43\" class=\"western\">L2 cache<\/h3>\n<p>In addition, Blackwell GPUs also have a relatively large L2 cache, which can play a comparable role to the Infinity Cache (L3 cache) in AMD GPUs \u2013 Blackwell does not have an L3 cache, L2 is the last level in the hierarchy before the memory itself. The L2 cache capacities seem to be unchanged in Blackwell generation GPUs compared to corresponding Ada Lovelace (RTX 4000) generation chips, except in the case of the GB202. This GPU has 128 MB of L2 cache versus 96 MB in its predecessor, the AD102.<\/p>\n<p>However, it appears that a good portion of this generous L2 cache capacity will be disabled on the GeForce RTX 5090, with only 96MB of it active in this gaming model. Only some future server or workstation SKUs based on the GB202 chip will probably feature the cache fully enabled. A similar thing happened with the RTX 4090.<\/p>\n<figure id=\"attachment_218505\" aria-describedby=\"caption-attachment-218505\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/nvidia-geforce-rtx-5090-founders-edition-09.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"noborder wp-image-218505 size-large\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/nvidia-geforce-rtx-5090-founders-edition-09-1024x576.jpg\" alt=\"\" width=\"640\" height=\"360\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/nvidia-geforce-rtx-5090-founders-edition-09-1024x576.jpg 1024w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/nvidia-geforce-rtx-5090-founders-edition-09-300x169.jpg 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/nvidia-geforce-rtx-5090-founders-edition-09-768x432.jpg 768w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-218505\" class=\"wp-caption-text\">GeForce RTX 5090 with the GB202 chip<\/figcaption><\/figure>\n<h3 id=\"h44\" class=\"western\">The smaller GPU in the line: the GB203<\/h3>\n<p>The <strong>GB203<\/strong> chip, which will be featured in the GeForce RTX 5080 and 5070 Ti, is just 378 mm\u00b2 in size and is said to contain 45.6 billion transistors. Interestingly, this is slightly less than in the last-generation AD103 chip (45.9 billion), which was also a hair bigger (378.6 mm\u00b2). From this, it seems that in the Blackwell generation, Nvidia has managed, with more or less the same 4N manufacturing technology from TSMC and the same transistor density, to squeeze in some extra new technology and more performance per unit area \u2013 unless the performance increase in the GeForce RTX 5080 against the RTX 4080 is just due to the increase in power consumption from 320 to 360 W (and thus clock speeds), which remains to be seen. But the Blackwell architecture itself should deliver slightly better performance at a given clock speed, so the fact that it doesn&#8217;t need much more on-chip space is notable.<\/p>\n<p>This GPU consists of 7 GPC blocks, 42 TPC blocks and 84 SM blocks. Thus, it has a total of 10,752 shaders, 84 RT cores, 336 texture units and 336 Tensor Cores. The GPU contains 64 MB of L2 cache, just like the previous AD103 in the GeForce RTX 4080.<\/p>\n<p>The count of 7 GPCs indicates a number of 112 ROP units. This GPU has only a 256-bit memory bus. Nvidia skipped a 384-bit bus Blackwell GPU configuration. As a result the memory width (and capacity as well) of the GeForce RTX 5080 will be half the size. This will be only slightly compensated by higher clock speeds, as GDDR7 will run at 30.0 GHz effectively on this model. But it&#8217;s true that the number of compute units is even less than half that of the GB202, so it&#8217;s not out of balance.<\/p>\n<figure id=\"attachment_217336\" aria-describedby=\"caption-attachment-217336\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/geforce-rtx-5070-3840x2160.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-217336 size-large\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/geforce-rtx-5070-3840x2160-1024x576.jpg\" alt=\"\" width=\"640\" height=\"360\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/geforce-rtx-5070-3840x2160-1024x576.jpg 1024w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/geforce-rtx-5070-3840x2160-300x169.jpg 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/geforce-rtx-5070-3840x2160-768x432.jpg 768w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-217336\" class=\"wp-caption-text\">GeForce RTX 5070 Founders Edition<\/figcaption><\/figure>\n<h3 id=\"h45\" class=\"western\">The GB205 for cheaper cards<\/h3>\n<p>The third chip in the series is the <strong>GB205<\/strong>, since a GB204 die does not exist and the replacement for the previous AD104 is the GB205 design. The die area of this GPU, which according to Nvidia consists of 31.1 billion transistors, is 263 mm\u00b2, significantly less than that of the AD104 chip (294.5 mm\u00b2 with 35.8 billion transistors), so Nvidia would have a higher margin if they RTX 5070 cards (which will use this GPU) were to replace the RTX 4070 in the market at the same price. Alternatively this allows RTX 5070 to be priced lower than RTX 4070 was.<\/p>\n<p>In this case, however, the smaller die area is due to the fact that the GB205 has weaker parameters. While the <a href=\"https:\/\/www.hwcooling.net\/specifikace-gpu-nvidia-ad102-ad103-a-ad104-v-grafikach-ada\/\">AD104 contains 60 SM blocks<\/a>, the GB205 chip has only 50 SMs (5 GPCs, 25 TPCs), which, in a full configuration, is 6400 shaders, 50 RT cores and 200 Tensor Cores \u2013 but the RTX 5070 will use a stripped-down configuration with only 6144 shaders, which enables harvesting of some chips with manufacturing defects for better yield utilization.<\/p>\n<p>Like the AD104, the chip has a 192-bit memory bus, but it can and will use GDDR7 memory just like its higher-end siblings. The L2 cache capacity is 48MB like that of the AD104 and the GPU also has the same 80 ROPs as AD104.<\/p>\n<p><em><strong>The article continues on the next page.<\/strong><\/em><\/p>\n<p><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- responsive -->\n<ins class=\"adsbygoogle\"\n     style=\"display:block;background-color:transparent\"\n     data-ad-client=\"ca-pub-8150419924824893\"\n     data-ad-slot=\"6522017574\"\n     data-ad-format=\"auto\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><br \/>\n&#10240;\u2800\u2800<br \/>\n<!--nextpage-->Although Nvidia&#8217;s graphics cards of the new generation \u2013 the GeForce RTX 5090 and RTX 5080 \u2013 won&#8217;t be out until the 30th, the embargo is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, and its new features and functions. From DLSS 4, through compute unit architecture and chip features, to the software side of this new generation.<!--more--><\/p>\n<h3 id=\"h47\" class=\"western\">SMs and shaders<\/h3>\n<p>As before, each SM block contains 128 \u201cshaders\u201d, 512KB total SRAM of register file, and 128KB of L1 cache (also usable as shared memory). Nvidia often (inaccurately) refers to these shaders or shader units as \u201cCUDA cores\u201d, but in reality, within a GPU, the unit that corresponds to a single &#8220;core&#8221; is the entire SM block, while the individual so-called shader units are actually &#8220;lanes&#8221; of SIMD units within that core.<\/p>\n<p>In this new architecture, Nvidia has changed the capabilities of the shader units. Previously, half (64) of the units were capable of calculating standard floating-point (FP) operations, which are the &#8220;bread and butter&#8221; of GPU graphics applications, while the other half, added since the Turing generation, could handle additional integer (INT) operations. Starting with the Ampere generation, this second set of units was generalized and can handle both INT and FP operations.<\/p>\n<p>Now, Nvidia has extended the same capabilities to the first half of the units as well, meaning that all shaders can now process either INT or FP operations (but not both simultaneously). The performance in pure FP32 operations remains unchanged. A performance increase could occur if the running code contains more than 50 % of integer operations (which is less than typical), or the performance could improve locally at least in sections of code where INT operations dominate.<\/p>\n<figure id=\"attachment_219203\" aria-describedby=\"caption-attachment-219203\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Blok-SM-architektury-Blackwell.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-219203 size-large\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Blok-SM-architektury-Blackwell-1024x642.png\" alt=\"\" width=\"640\" height=\"401\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Blok-SM-architektury-Blackwell-1024x642.png 1024w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Blok-SM-architektury-Blackwell-300x188.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Blok-SM-architektury-Blackwell-768x482.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Blok-SM-architektury-Blackwell.png 1856w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-219203\" class=\"wp-caption-text\">An SM block of the Blackwell architecture<\/figcaption><\/figure>\n<p>These 128 shaders are the standard units supporting 32-bit precision (FP32). Separately, within each SM block, there are also units for double-precision calculations (FP64), but there are only two of them (compared to 128 FP32\/INT32 units). As a result, the GPU can process FP64 operations at only 1\/64th of its full FP32 performance \u2013 this capability is essentially included just for compatibility so that code containing FP64 ops runs correctly.<\/p>\n<h4 id=\"h48\" class=\"western\">Improved Shader Execution Reordering<\/h4>\n<p>In addition, the shaders of the Blackwell architecture feature improved dynamic ordering of the shader instruction (SER 2.0, or Shader Execution Reordering 2.0) compared to the Ada Lovelace architecture. The logic responsible for dynamically reordering operations is said to be up to 2\u00d7 more efficient (though it\u2019s hard to say exactly how this is measured). SER 2.0 is expected to have lower overhead and a better ability to identify opportunities for performance improvements.<\/p>\n<figure id=\"attachment_219212\" aria-describedby=\"caption-attachment-219212\" style=\"width: 1673px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/SER20.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219212\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/SER20.png\" alt=\"\" width=\"1673\" height=\"525\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/SER20.png 1673w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/SER20-300x94.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/SER20-768x241.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/SER20-1024x321.png 1024w\" sizes=\"auto, (max-width: 1673px) 100vw, 1673px\" \/><\/a><figcaption id=\"caption-attachment-219212\" class=\"wp-caption-text\">Shader Execution Reordering 2.0<\/figcaption><\/figure>\n<p>This SER capability is not active globally at all times; by default, the GPU does not use it, so it is not something entirely equivalent to out-of-order execution in CPUs. Nvidia states that game developers can optionally enable SER via an API. It seems that SER may not always have a positive impact on performance, so the use of this technology is &#8220;opt-in&#8221;. Developers can apply it to functions where profiling indicates that SER will improve performance. So far, the adoption of this technology in games does not appear to be widespread (although <a href=\"https:\/\/www.hwcooling.net\/en\/nvidia-geforce-rtx-4000-cards-are-here-models-parameters-prices\/\">it was already introduced in the GeForce RTX 4000 series<\/a>), with Nvidia noting that the feature is currently used in &#8220;several ray-traced games\u201d (indicating relatively limited number of titles).<\/p>\n<h4 id=\"h49\" class=\"western\">Neural shaders<\/h4>\n<p>A new feature of the shaders in the Blackwell architecture is compatibility with so-called Neural Shaders. These are operations using Tensor Cores, but not targetting them as a stand-alone unit, but calling them from within shader programs that are running on shaders in a SMs. What this does is calling a pre-trained neural network with a relatively small model directly within the shader program.<\/p>\n<p>This was not possible before because Tensor Cores were not tightly-enough integrated with the shader units. This capability is only now available in the Blackwell GPU architecture (note: this might not be an issue for AMD architectures, as their current form of AI acceleration is integrated into traditional compute units using the same working registers. Using both traditional shader instructions and WMMA instructions for AI acceleration within a single shader program is likely possible automatically on RDNA 3 and newer GPUs \u2013 though it\u2019s unclear whether this approach will stay in future UDNA architectures).<\/p>\n<h4 id=\"h410\" class=\"western\">Stochastic Texture Filtering<\/h4>\n<p>Nvidia has proposed a technique called Stochastic Texture Filtering for Blackwell, which serves as a partial replacement for more complex filtering methods (such as trilinear or anisotropic filtering) and relies on the principle of partially randomizing the result. Adding noise can help prevent artifacts like moir\u00e9 patterns. For this technique, Blackwell has doubled the performance of unfiltered (nearest-neighbor) interpolation in its texture units. In this context it&#8217;s interesting that the <a href=\"https:\/\/www.hwcooling.net\/en\/mobile-zen-5-is-here-ryzen-ai-300-strix-point-soc-detailed\/\">AMD RDNA 3.5 architecture<\/a> (and possibly RDNA 4 as well) has added support for accelerating unfiltered interpolation during texturing as well, in theory also targetting this stochastic filtering approach.<\/p>\n<h3 id=\"h411\" class=\"western\">New Tensor Cores: FP6 and FP4 support<\/h3>\n<p>Tensor Cores are now in their fifth generation, and their new architecture introduces support for FP4 precision operations. This allows for double the number of operations compared to calculations with 8-bit precision (INT8 or FP8). At the same time, storing data for a model with a certain number of parameters requires only half the memory capacity compared to a model with 8-bit precision.However, the downside is the extremely low precision of these values \u2013 or rather, it\u2019s questionable whether we can even talk about precision in this context. FP4 is supposed to allocate only two bits (i.e., four possible values) for the exponent and one bit (i.e., two values) for the mantissa, with the fourth bit being the sign. However, it seems that for AI applications, a format with a 3-bit exponent and no mantissa is also being proposed. The base would still be two, these are still binary floating-point numbers. Perhaps such data types might be better understood as something bordering between classic variable storing numbers and something which is akin to a more expressive upgrade of a true\/false logical value (which is a 1-bit value).<\/p>\n<figure id=\"attachment_219201\" aria-describedby=\"caption-attachment-219201\" style=\"width: 1833px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/FP4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219201\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/FP4.png\" alt=\"\" width=\"1833\" height=\"1031\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/FP4.png 1833w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/FP4-300x169.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/FP4-768x432.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/FP4-1024x576.png 1024w\" sizes=\"auto, (max-width: 1833px) 100vw, 1833px\" \/><\/a><figcaption id=\"caption-attachment-219201\" class=\"wp-caption-text\">5th-gen Tensor Cores support the FP4 data type<\/figcaption><\/figure>\n<p>Neural networks (AI models) are generally surprisingly resistant to low-precision data \u2013 at least during inference \u2013 compared to the precision required for general-purpose numerical calculations. However, a degradation in quality of the inference results is often observed already with INT8\/FP8 models, and this effect is likely to be even more pronounced with INT4 or FP4. An FP4 model with the same number of parameters would probably deliver worse results than a FP8 model. Therefore, it\u2019s possible that the use of these low-precision values will be limited to specific applications or will require various accommodations (for example, using 4-bit precision only for part of the calculations or by increasing the number of parameters in the model). Thus, the doubling of performance may not be always achieved.<\/p>\n<p>As mentioned, the ability to fit a model with a certain number of parameters into a GPU with smaller memory (e.g., 32 GB instead of 64 GB) using 4-bit values could be significant. However, the trade-off of lower quality will always be present, so it\u2019s unclear how practical this compromise will be in real-world scenarios. An alternative to FP4 could be the FP6 data type, which Blackwell also supports for the first time. With FP6, the performance doubling likely won\u2019t apply, and its purpose is presumably just the memory footprint savings.<\/p>\n<p>On the software side, the Tensor Cores in Blackwell\u2019s gaming chips are designed to support the same type of neural networks (which Nvidia calls the second-generation transformer engine) as the server version of Blackwell, the GB200.<\/p>\n<h3 id=\"h412\" class=\"western\">RT cores with 2\u00d7 compute capability<\/h3>\n<p>The RT cores (one in each SM block) in Blackwell GPUs are now in their fourth generation. Their main new feature is a doubled capacity for processing ray-triangle intersections (calculating where rays intersect with triangles in a scene) per clock cycle. In Ada Lovelace, an RT core was capable of handling 4 intersections per cycle, so Blackwell should be able to handle 8 intersections per cycle.<\/p>\n<figure id=\"attachment_219202\" aria-describedby=\"caption-attachment-219202\" style=\"width: 1833px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RT-j\u00e1dro.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219202\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RT-j\u00e1dro.png\" alt=\"\" width=\"1833\" height=\"1031\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RT-j\u00e1dro.png 1833w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RT-j\u00e1dro-300x169.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RT-j\u00e1dro-768x432.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RT-j\u00e1dro-1024x576.png 1024w\" sizes=\"auto, (max-width: 1833px) 100vw, 1833px\" \/><\/a><figcaption id=\"caption-attachment-219202\" class=\"wp-caption-text\">An RT core of the Blackwell generation has 2\u00d7 higher performance available for the analysis of the intersections of light rays and triangles of a 3D model<\/figcaption><\/figure>\n<p>The number of intersections of rays and BVH boxes that can be processed is not specified, nor is any improvement in this area mentioned. However, the latter could still occur, because with Ada Lovelace, as far as we know four per cycle were also supported, and it&#8217;s odd that fewer operations would be supported at the BVH box (an auxiliary structure used just for the analysis) level than with the triangles themselves.<\/p>\n<p><em><strong>The article continues on the next page.<\/strong><\/em><\/p>\n<p><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- responsive -->\n<ins class=\"adsbygoogle\"\n     style=\"display:block;background-color:transparent\"\n     data-ad-client=\"ca-pub-8150419924824893\"\n     data-ad-slot=\"6522017574\"\n     data-ad-format=\"auto\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><br \/>\n&#10240;\u2800\u2800<br \/>\n<!--nextpage-->Although Nvidia&#8217;s graphics cards of the new generation \u2013 the GeForce RTX 5090 and RTX 5080 \u2013 won&#8217;t be out until the 30th, the embargo is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, and its new features and functions. From DLSS 4, through compute unit architecture and chip features, to the software side of this new generation.<!--more--><\/p>\n<h3 id=\"h414\" class=\"western\">PCI Express 5.0<\/h3>\n<p>Blackwell introduces support for a newer generation of PCI Express interface \u2013 or rather, a more recent one. PCIe 5.0 introduced in Blackwell was actually expected to be featured in the previous Ada Lovelace generation in 2022, as this interface has already been available on LGA 1700 motherboards (not to mention the newer LGA 1851) and Intel Alder Lake processors since 2021. On the AMD side, Ryzen 7000 (and 9000) processors on the AM5 platform have supported it since 2022. The specification was finalized in 2019, and since 2021, the specification for the <a href=\"https:\/\/www.hwcooling.net\/novy-pci-express-gen6-byl-vydan-uz-8x-rychlejsi-nez-pcie-gen3-pam4\/\">follow-up PCI Express 6.0 generation with double the speed<\/a> has been out. This year, the <a href=\"https:\/\/www.hwcooling.net\/en\/pci-express-7-0-to-be-ready-this-year-4x-faster-than-gen-5-interface\/\">PCIe 7.0 technology<\/a> is even expected to be finalized. It\u2019s fair to say that gaming GPUs (across all brands, not just Nvidia) are significantly behind in this regard. Both GB203 and GB205 support PCI Express 5.0, just like the high-end GB202.<\/p>\n<p>PCI Express 5.0, in any case, brings double the bandwidth for the interface between the processor (and the whole the computer system including RAM and storage) and the graphics card. With PCIe 5.0 \u00d716, this amounts to up to 64 GB\/s (in both directions, as PCIe is duplex). Alternatively, this can be utilized in scenarios where the GPU is allocated only eight lanes (\u00d78), for example, on motherboards that divert some lanes from the graphics card slot to M.2 slots for SSDs. PCIe 5.0 provides bandwidth equivalent to a previous-gen full PCIe 4.0 \u00d716 interface in such scenarios. The impact of bandwidth between the GPU and the rest of the system may be slightly more pronounced in compute applications than it is in games, but in any case, faster PCI Express raises the capacity of one of the potential performance bottlenecks, even if it is one that likely is not being hit very often.<\/p>\n<p>techPowerUp <a href=\"https:\/\/www.techpowerup.com\/review\/nvidia-geforce-rtx-5090-pci-express-scaling\/\">tested the performance scaling across different generations of PCI Express<\/a> on the GeForce RTX 5090 and found that this GPU does benefit slightly from PCIe 5.0 \u00d716. In other words, when using only PCIe 4.0 \u00d716 (or PCIe 5.0 \u00d78), there is a slight performance drop \u2013 though not hugely significant. The average difference is only about 1%, but some games show exceptions (larger drops, for example, in the game No Rest For The Wicked). Older PCI Express 3.0 \u00d716 (or the equivalent PCIe 4.0 \u00d78) has a more noticeable impact, with an average performance loss of about 4%. The drops are usually more severe at resolutions of 2560 \u00d7 1440 and 3840 \u00d7 2160 and less so at 1920 \u00d7 1080. However, among the tested games, some behaved the opposite way, with more severe performance degradation at 1920 \u00d7 1080. It\u2019s possible that this inconsistency was caused by driver issues or bugs in some cases.<\/p>\n<h3 id=\"h415\" class=\"western\">DisplayPort 2.1b<\/h3>\n<p>Blackwell also addresses another overdue feature and something that was expected from the previous generation but wasn\u2019t supported: finally, it introduces support for DisplayPort 2.0\/2.1 for connecting monitors with higher resolutions and refresh rates. Alternatively, this allows for less aggressive lossy compression, which is commonly used in monitors today. This was a more regrettable debt compared to PCIe 5.0, because DisplayPort 2.1 has already been supported by some GPUs from the previous generation \u2013 such as Radeon RX 7000 and Intel Arc Alchemist cards (<a href=\"https:\/\/www.hwcooling.net\/en\/batttlemage-details-of-intel-xe2-gpu-architecture-analysis\/\">and the newer Battlemage generation<\/a> as well).<\/p>\n<p>GeForce RTX 5000 graphics cards can already do <a href=\"https:\/\/www.hwcooling.net\/en\/displayport-2-1b-introduced-new-geforce-gpus-already-have-it\/\">DisplayPort 2.1b with support for longer active cables<\/a>. The output is also supported in the fastest (in terms of transfer capacity) UHBR20 mode, with a bandwidth of 77.37 Gb\/s (practically triple the DisplayPort 1.4a bandwidth that was the maximum for the Ada Lovelace\/RTX 4000 generation GPUs).<\/p>\n<figure id=\"attachment_219211\" aria-describedby=\"caption-attachment-219211\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Blackwell-Display-and-Video.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-219211 size-large\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Blackwell-Display-and-Video-1024x742.png\" alt=\"\" width=\"640\" height=\"464\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Blackwell-Display-and-Video-1024x742.png 1024w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Blackwell-Display-and-Video-300x217.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Blackwell-Display-and-Video-768x556.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Blackwell-Display-and-Video.png 1089w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-219211\" class=\"wp-caption-text\">Nvidia Blackwell \u2013 image output and video<\/figcaption><\/figure>\n<p>Thanks to DP 2.1, Blackwell GPUs \u2013 like Radeon RX 7000s \u2013 support 8K resolution screens at up to 165 Hz (until now, the maximum for GeForce cards was 60 Hz) or 4K at up to 480 Hz. Nvidia&#8217;s documentation mentions that UHBR20 will require active cables (DP80LL introduced in DP 2.1b), while competing cards should be able to do UHBR20 on passive cables up to 1\u20131.2 meters long (Radeon Pro cards with RDNA 3 architecture can do UHBR20, although gaming cards support only UHBR13.5). Whether GeForce RTX 5000 actually can&#8217;t handle this or it&#8217;s just some unclear formulation is something that remains to be clarified. But at least UHBR13.5 (which is still double bandwidth compared to DP1.4a) should work with a regular cheap cables.<\/p>\n<p>The Blackwell architecture doesn&#8217;t support the upcoming HDMI 2.2, but that wasn&#8217;t something that was expected as the spec isn&#8217;t even ready yet.<\/p>\n<h3 id=\"h416\" class=\"western\">Multimedia acceleration: incremental improvement<\/h3>\n<p>The GPUs, on the other hand, should have a newer generation multimedia engines (9th generation NVEnc, 6th generation NVDec). There&#8217;s one (RTX 5070 Ti, RTX 5070) or two video decoders, and there are up to three encoders on the chip (for RTX 5090), but fewer for lower models (RTX 5070 has only one). But we don&#8217;t know if these are the actual maximum specs of the chips. For gaming cards, some units can be artificially disabled, as was the case in the previous generation where dies physically had 3+3 units bude only 1-2 of each were enabled on GeForce cards, likely for segmentation purposes (full multimedia support was gated to the more costly workstation and compute cards).<\/p>\n<p>Those multimedia engines are of new generations, but they don&#8217;t seem to be able to handle new formats (VVC format acceleration is missing, it&#8217;s currently only supported by <a href=\"https:\/\/www.hwcooling.net\/en\/the-comeback-of-intel-next-gen-lunar-lake-mobile-cpu-introduced\/\">the Intel Lunar Lake\/Core 200V processor<\/a>). However, there is supposed to be an improvement in compression quality when encoding AV1, and an additional &#8220;Ultra Quality&#8221; compression setting has been added. Nvidia claims up to a 4\u201318% improvement in compression (or BD-Rate PSNR and BD.BR VMAF metrics).<\/p>\n<figure id=\"attachment_219205\" aria-describedby=\"caption-attachment-219205\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Enk\u00f3dov\u00e1n\u00ed-AV1-na-GPU-Blackwell.png\"><img loading=\"lazy\" decoding=\"async\" class=\"noborder wp-image-219205 size-large\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Enk\u00f3dov\u00e1n\u00ed-AV1-na-GPU-Blackwell-1024x446.png\" alt=\"\" width=\"640\" height=\"279\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Enk\u00f3dov\u00e1n\u00ed-AV1-na-GPU-Blackwell-1024x446.png 1024w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Enk\u00f3dov\u00e1n\u00ed-AV1-na-GPU-Blackwell-300x131.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Enk\u00f3dov\u00e1n\u00ed-AV1-na-GPU-Blackwell-768x334.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Enk\u00f3dov\u00e1n\u00ed-AV1-na-GPU-Blackwell.png 2048w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-219205\" class=\"wp-caption-text\">AV1 encoding on Blackwell GPUs<\/figcaption><\/figure>\n<p>Nvidia has also added support for 4:2:2 chroma subsampling in YUV colors for some formats (HEVC and H.264). This encoding is used by some professional camera outputs.<\/p>\n<h4 id=\"h417\">Hotspot temperature data will not be accessible<\/h4>\n<p>A thing that might bother overclockers is the removal of hotspot temperature information (so you won&#8217;t see it in HWiNFO, for example). This wasn&#8217;t any particular sensor, but information about what the highest temperature is reached on the GPU in any of the sensors. There should be more temperature sensors in the chip, so this gives better information in addition to the average temperature, identifying potential local overheating of the chip.<\/p>\n<p>A potential use of this information could be, for example, when a cooler is poorly fitted or paste is not applied correctly \u2013 a large discrepancy between the average temperature (which may still look innocent) and the hotspot temperature could have been used as some sort of warning sign until now. It is unknown for what reason exactly has Nvidia made hotspot temperature information unavailable to the user in Blackwell GPUs. According to the company, these numbers have never been very useful, but it may raise suspicions that the figure may have been unflattering \u2013 especially, say, for the top-of-the-line RTX 5090 with its 575W TDP.<\/p>\n<p><em><strong>The article continues on the next page.<\/strong><\/em><\/p>\n<p><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- responsive -->\n<ins class=\"adsbygoogle\"\n     style=\"display:block;background-color:transparent\"\n     data-ad-client=\"ca-pub-8150419924824893\"\n     data-ad-slot=\"6522017574\"\n     data-ad-format=\"auto\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><br \/>\n&#10240;\u2800<br \/>\n<!--nextpage-->Although Nvidia&#8217;s graphics cards of the new generation \u2013 the GeForce RTX 5090 and RTX 5080 \u2013 won&#8217;t be out until the 30th, the embargo is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, and its new features and functions. From DLSS 4, through compute unit architecture and chip features, to the software side of this new generation.<!--more--><\/p>\n<h3 id=\"h419\" class=\"western\">Mega Geometry<\/h3>\n<p>The functioning of RT cores in the Blackwell generation is also set to be enhanced with some new capabilities partially or whole based in software, which will likely only become useful once they are integrated into new games. For example, there is support for new types of objects such as Subdivision Surfaces and Linear Swept Spheres.<\/p>\n<p>The so-called Mega Geometry feature announced for Blackwell appears to be of a software nature, designed to improve performance when working with many objects in a scene where ray tracing calculations are required. It allows triangles to be grouped into larger structures (clusters called <strong>CLAS<\/strong>). One practical application of this is that these clusters will be easier to replace in-scene, which is something that happens, for instance, when objects move further away from viewpoint, in which situation the game engine replaces them with models containing fewer triangles (lower levels of detail). However, replacing models when ray tracing effects are used requires constructing a new hierarchy of bounding boxes (BVH) for analyzing these newly swapped-in models. That process is performance-intensive, and changes in detail levels for many objects at once can cause significant FPS drops.<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/clas.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-219204 size-large\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/clas-1024x649.png\" alt=\"\" width=\"640\" height=\"406\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/clas-1024x649.png 1024w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/clas-300x190.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/clas-768x486.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/clas-550x350.png 550w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/clas.png 2048w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>Mega Geometry adds processing of objects in clusters (CLAS) with the aim to make this process easier and more efficient, so such operations in games will require less performance overhead. At the same time, working in this mode is expected to perform more operations entirely within the GPU, without involving the system&#8217;s CPU. This means that using these techniques will reduce the game&#8217;s overhead in the drivers and potentially alleviate CPU performance bottlenecks. Nvidia notes that this technology could be particularly beneficial for Unreal Engine 5 and its Nanite geometry technology.<\/p>\n<p>In addition to these clusters, the Mega Geometry technology also introduces the organization of geometry and objects into partitions (<strong>PTLAS<\/strong>). This can be used to separate static objects in a scene into distinct partitions. Geometry updates running each frame can then be optimized by skipping the partitions (PTLAS) with static objects during processing for that frame, meaning they are not recalculated like the objects that are in motion.<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/tlas.png\"><img loading=\"lazy\" decoding=\"async\" class=\"noborder aligncenter wp-image-219209 size-full\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/tlas.png\" alt=\"\" width=\"554\" height=\"502\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/tlas.png 554w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/tlas-300x272.png 300w\" sizes=\"auto, (max-width: 554px) 100vw, 554px\" \/><\/a><\/p>\n<h4 id=\"h420\" class=\"western\">Support also in older GPUs<\/h4>\n<p>Mega Geometry is expected to be supported in DirectX 12 via NVAPI, in Vulkan through a vendor extension, and also in the OptiX 9.0 API used for rendering software. Support should also extend to older GPUs starting from the RTX 2000 series and above, indicating that it is not directly dependent on specific architectural features of the Blackwell GPUs (it does not appear to be something integrated directly into the hardware).<\/p>\n<p>However, according to Nvidia, the Blackwell GPUs feature improved compression for BVH structures, which will allow these structures to take up less space in memory. Reportedly, this could save hundreds of megabytes in games with demanding geometry and ray tracing that implement these technologies.<\/p>\n<h3 id=\"h421\" class=\"western\">DLSS 4: A new neural network and more artificial images<\/h3>\n<p>One of the central &#8220;technologies&#8221; of the GeForce RTX 5000 series is DLSS 4. At its core, it builds upon the frame generation technique introduced with DLSS 3. Until now, this method added one interpolated (artificial, non-genuine) intermediate frame between every two frames rendered by the game, meaning that 50% of the frames were real, while the other 50% were only interpolated. We have written about how frame generation works and its advantages and disadvantages here:<\/p>\n<p><strong>Read more: <\/strong><a href=\"https:\/\/www.hwcooling.net\/nvidia-uvadi-dlss-3-s-generovanim-snimku-navic-geforce-rtx-4000-ada-lovelace-jak-to-funguje\/\"><strong>Nvidia introduces DLSS 3 with extra frame generation. How does it work?<\/strong><\/a><\/p>\n<p>The new feature in DLSS 4 is called &#8220;<strong>Multi Frame Generation<\/strong>&#8220;, which essentially means that more artificially interpolated frames are now inserted between the real frames. This could involve two interpolated frames (resulting in 66% of the output being interpolated-only frames, theoretically tripling the FPS compared to the actual game&#8217;s frame rate) or even three frames, meaning 75% of the frames you see are merely interpolated (potentially leading to worse quality), with only 25% being real. Theoretically, you could achieve 4\u00d7 higher apparent FPS than what the game and GPU are actually rendering.<\/p>\n<p>The downside is that, while potential errors in the interpolated frames can blend in relatively well at a 50:50 ratio, now it&#8217;s the &#8220;made-up&#8221; frames you&#8217;re seeing most of the time. This could have the opposite effect, where the &#8220;interpolated quality&#8221; dominates the overall experience.<\/p>\n<figure id=\"attachment_219197\" aria-describedby=\"caption-attachment-219197\" style=\"width: 1920px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Multi-Frame-Generation.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219197\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Multi-Frame-Generation.png\" alt=\"\" width=\"1920\" height=\"1080\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Multi-Frame-Generation.png 1920w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Multi-Frame-Generation-300x169.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Multi-Frame-Generation-768x432.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Multi-Frame-Generation-1024x576.png 1024w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/a><figcaption id=\"caption-attachment-219197\" class=\"wp-caption-text\">DLSS 4 Multi Frame Generation<\/figcaption><\/figure>\n<p>As a reminder: the insertion of generated frames slightly increases game latency, as both border frames of the sequence (between which interpolation occurs) need to be fully rendered and available before generation can begin. This means that the display must always lag slightly behind the game&#8217;s state. Only when generated frames are not used can a newly rendered frame be immediately displayed on the monitor.<\/p>\n<p>Nvidia compensates for this with Reflex technology, which can also be enabled without DLSS 3 or DLSS 4 and independently reduces latency on its own. (The impact of Reflex is not in any way a benefit of frame generation or DLSS, even though company&#8217;s marketing messaging often tries to conflate the two.)<\/p>\n<p>Generated frames are also not fully equal to real frames in the sense that they are not created by the game engine. This means that the engine does not update AI behavior, object positions, projectiles, or similar elements in these frames. Frame generation merely approximates all movements and changes based on the positions of objects visible in the real frames at the start and the end of the sequence, and &#8220;fills in&#8221; the guessed inter-states between the generated frames.<\/p>\n<figure id=\"attachment_219198\" aria-describedby=\"caption-attachment-219198\" style=\"width: 1920px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-DLSS-4-Multi-Frame-Generation-ve-hr\u00e1ch.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219198\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-DLSS-4-Multi-Frame-Generation-ve-hr\u00e1ch.png\" alt=\"\" width=\"1920\" height=\"1080\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-DLSS-4-Multi-Frame-Generation-ve-hr\u00e1ch.png 1920w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-DLSS-4-Multi-Frame-Generation-ve-hr\u00e1ch-300x169.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-DLSS-4-Multi-Frame-Generation-ve-hr\u00e1ch-768x432.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-DLSS-4-Multi-Frame-Generation-ve-hr\u00e1ch-1024x576.png 1024w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/a><figcaption id=\"caption-attachment-219198\" class=\"wp-caption-text\">DLSS 4 support in games<\/figcaption><\/figure>\n<h4 id=\"h422\" class=\"western\">Improved AI model<\/h4>\n<p>In addition to more interpolated frames, DLSS 4 introduces a second component \u2013 a newer, improved model. It features a Transformer-type neural network, whereas previous DLSS versions used a convolutional neural network type. The new model is expected to somewhat enhance the quality of the DLSS upscaling component, the Ray Reconstruction feature (<a href=\"https:\/\/www.hwcooling.net\/en\/nvidia-launches-dlss-3-5-improved-ray-tracing-not-only-for-rtx-4000-ada\/\">introduced in DLSS 3.5<\/a>), and likely temporal reconstruction as well \u2013 because Nvidia mentions improved image stability between frames, resulting in less shimmering, ghosting, motion blur, and flickering.<\/p>\n<p>This part of DLSS 4 will also work on older GPUs, starting with the GeForce RTX 2000 series.<\/p>\n<figure id=\"attachment_219196\" aria-describedby=\"caption-attachment-219196\" style=\"width: 1920px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Transformer.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219196\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Transformer.png\" alt=\"\" width=\"1920\" height=\"1080\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Transformer.png 1920w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Transformer-300x169.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Transformer-768x432.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/DLSS-4-Transformer-1024x576.png 1024w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/a><figcaption id=\"caption-attachment-219196\" class=\"wp-caption-text\">Demonstration of the benefits of the new Transformer neural network in DLSS 4, screenshot by Nvidia<\/figcaption><\/figure>\n<p>However, multi-frame interpolation is limited to the new RTX 5000 cards. Ironically, this is despite the fact that it doesn\u2019t actually rely on any special hardware units. This comes as a surprise because frame interpolation in the previous DLSS 3 depends on specific hardware units in Ada Lovelace chips. DLSS 4, however, has moved away from this and uses only Tensor Cores, making it, in a sense, more of a software-based solution (within the context of still being a neural network running on the tensor hardware accelerators). The performance of these Tensor Cores is higher in the new generation, but even so \u2013 if DLSS 4 multi-frame generation can work on, say, the RTX 5070 or future RTX 5060, then at least the higher-end models of previous generations should theoretically have enough tensor core performance to handle it as well. Nvidia has admitted that support for older GPUs could theoretically be added, but as of now, nothing has been promised.<\/p>\n<figure id=\"attachment_219199\" aria-describedby=\"caption-attachment-219199\" style=\"width: 1920px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-funkc\u00ed-DLSS-4-na-r\u016fzn\u00fdch-GPU.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219199\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-funkc\u00ed-DLSS-4-na-r\u016fzn\u00fdch-GPU.png\" alt=\"\" width=\"1920\" height=\"1080\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-funkc\u00ed-DLSS-4-na-r\u016fzn\u00fdch-GPU.png 1920w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-funkc\u00ed-DLSS-4-na-r\u016fzn\u00fdch-GPU-300x169.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-funkc\u00ed-DLSS-4-na-r\u016fzn\u00fdch-GPU-768x432.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Podpora-funkc\u00ed-DLSS-4-na-r\u016fzn\u00fdch-GPU-1024x576.png 1024w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/a><figcaption id=\"caption-attachment-219199\" class=\"wp-caption-text\">Support for DLSS 4 functions on different GPUs<\/figcaption><\/figure>\n<p>Currently, the situation appears to be that the new multi-frame &#8220;FPS interpolation&#8221; will only be available on RTX 5000 cards. GeForce RTX 4000 cards will continue using single-frame generation in DLSS 3.x mode, while GeForce RTX 3000 and RTX 2000 cards will not have Nvidia&#8217;s frame generation available to them at all.<\/p>\n<h3 id=\"h423\" class=\"western\">Reflex 2 for better latency<\/h3>\n<p>Speaking of Reflex, Nvidia is introducing the second generation of this technology, called Reflex 2, with the release of the GeForce RTX 5000 series. This includes a technique called Frame Warp, which aims to partially improve game responsiveness when using multi-frame generation.<\/p>\n<p>Reflex 2 works by incorporating adjustments to the frame based on the real movement of the mouse cursor. This input can be obtained independently of the game engine, allowing the GPU driver to have slightly newer information about keyboard and mouse inputs after rendering the frame than what was available when the frame has originally started to be calculated.<\/p>\n<p>When Reflex 2 is enabled, the frame is modified before being sent to the monitor \u2013 it can be globally shifted with perspective\/depth corrections based on how you moved the mouse to adjust your view. In the adjusted frame, the driver also redraws the cursor or crosshair into the correct position. Missing data at the edges of the frame is filled in through interpolation, which may cause artifacts or errors. (In general, such meddling with frames outside of the game&#8217;s engine can always lead to potential visual inaccuracies or faults compared to a frame directly rendered by the game, this is the same case as with frame generation features.)<\/p>\n<figure id=\"attachment_219200\" aria-describedby=\"caption-attachment-219200\" style=\"width: 1920px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Reflex-2-Frame-Warp.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-219200\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Reflex-2-Frame-Warp.png\" alt=\"\" width=\"1920\" height=\"989\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Reflex-2-Frame-Warp.png 1920w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Reflex-2-Frame-Warp-300x155.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Reflex-2-Frame-Warp-768x396.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Nvidia-Reflex-2-Frame-Warp-1024x527.png 1024w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/a><figcaption id=\"caption-attachment-219200\" class=\"wp-caption-text\">Nvidia Reflex 2 &#8211; Frame Warp<\/figcaption><\/figure>\n<p>It is probably clear that only some changes can be reflected in such a modified image, not anything. As with frame generation, Reflex 2 can&#8217;t know about things that the game knows should happen at a given moment but that haven&#8217;t yet been seen in the frame available to the Reflex 2 feature (but here the limitations are harsher then for frame generation, because Reflex 2 can&#8217;t look at the next frame for reference). So the latency reduction achieved by Frame Warp is just partial, it doesn&#8217;t necessarily apply to everything that is displayed on the screen.<\/p>\n<p>Reflex 2 with this Frame Warp feature should apparently only work without generating frames for the moment. The intended use of this feature is for competitive gaming, it probably has limited usefulness outside of eSports (if you&#8217;re playing in single-player, extremely suppressed latencies probably aren&#8217;t a big deal for you).<\/p>\n<h3 id=\"h424\" class=\"western\">\u201eAI\u201c textures, materials and lighting<\/h3>\n<p>Nvidia wants to use the mentioned Neural Shaders for various software technologies for games. Among them is the <strong>Neural Texture Compression<\/strong> technique \u2013 the application of a neural network to the compression and presumably also the decompression process of textures, which is supposed to bring a slightly better compression ratio compared to the commonly used formats that are used for texture compression in games now. Experiments with such formats have already been published (not just by Nvidia), but it may take some time before these techniques make it into any games.<\/p>\n<p>Next, Nvidia mentions the <strong>Neural Radiance Cache<\/strong> technique, where inference via a neural network is used to speed up the lighting calculation (presumably by approximating and caching information, which will be faster than a full calculation despite using a neural network). Rendering with Neural Radiance Cache is supposed to skip the analysis of a significant portion of the light rays, the question of course is how noticeable an effect this will have on quality.<\/p>\n<p>Of a similar nature are the <strong>RTX Skin<\/strong> and <strong>Neural Materials<\/strong> techniques. Here too, a neural network is to be used to approximate certain qualities and characteristics of materials. In this role, a simple neural network is intended to replace more complex simulations of such materials, such as the penetration of light under the surface of skin.<\/p>\n<figure id=\"attachment_219207\" aria-describedby=\"caption-attachment-219207\" style=\"width: 2048px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RTX-Neural-Materials.png\"><img loading=\"lazy\" decoding=\"async\" class=\"noborder wp-image-219207 size-full\" src=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RTX-Neural-Materials.png\" alt=\"\" width=\"2048\" height=\"804\" srcset=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RTX-Neural-Materials.png 2048w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RTX-Neural-Materials-300x118.png 300w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RTX-Neural-Materials-768x302.png 768w, https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/RTX-Neural-Materials-1024x402.png 1024w\" sizes=\"auto, (max-width: 2048px) 100vw, 2048px\" \/><\/a><figcaption id=\"caption-attachment-219207\" class=\"wp-caption-text\">RTX Neural Materials<\/figcaption><\/figure>\n<h3 id=\"h425\" class=\"western\">RTX 5000 coming to market this week<\/h3>\n<p>You can already partially see how it all works in practice in the reviews of Blackwell cards. At HWCooling <a href=\"https:\/\/www.hwcooling.net\/en\/nvidia-geforce-rtx-5090-fe-review-next-level-gaming\/\">we tested a GeForce RTX 5090 Founders Edition<\/a> directly from Nvidia. This card will become available for purchase on January 30, which should also be the date when the significantly cheaper GeForce RTX 5080 becomes available. We discussed the specs of all the cards here:<\/p>\n<ul>\n<li><strong>Read more: <a href=\"https:\/\/www.hwcooling.net\/en\/geforce-rtx-5090-rtx-5080-rtx-5070-ti-and-rtx-5070-in-detail\/\" rel=\"bookmark\">GeForce RTX 5090, RTX 5080, RTX 5070 Ti and RTX 5070 in detail<\/a><\/strong><\/li>\n<li><strong>Read more: <\/strong><a href=\"https:\/\/www.hwcooling.net\/nvidia-uvadi-mobilni-geforce-rtx-5000-blackwell-pro-notebooky\/\" rel=\"bookmark\"><strong>Nvidia introduces mobile GeForce RTX 5000: Blackwell for laptops<\/strong><\/a><\/li>\n<\/ul>\n<p><em>Sources: Nvidia<\/em><\/p>\n<blockquote class=\"wp-embedded-content\" data-secret=\"CfWECL5xDg\"><p><a href=\"https:\/\/www.hwcooling.net\/en\/nvidia-geforce-rtx-5090-fe-review-next-level-gaming\/\">Nvidia GeForce RTX 5090 FE review: Next-Level Gaming<\/a><\/p><\/blockquote>\n<p><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;Nvidia GeForce RTX 5090 FE review: Next-Level Gaming&#8221; &#8212; HWCooling.net\" src=\"https:\/\/www.hwcooling.net\/en\/nvidia-geforce-rtx-5090-fe-review-next-level-gaming\/embed\/#?secret=gsiQiUnQX2#?secret=CfWECL5xDg\" data-secret=\"CfWECL5xDg\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe><\/p>\n<p style=\"text-align: right;\"><em>English translation and edit by Jozef Dud\u00e1\u0161<\/em><\/p>\n<p><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- responsive -->\n<ins class=\"adsbygoogle\"\n     style=\"display:block;background-color:transparent\"\n     data-ad-client=\"ca-pub-8150419924824893\"\n     data-ad-slot=\"6522017574\"\n     data-ad-format=\"auto\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><br \/>\n&#10240;<br \/>\n\u2800<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Nvidia&#8217;s new graphics cards \u2013 the GeForce RTX 5090 and RTX 5080 \u2013 won&#8217;t be out until the 30th, but NDA is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, its [&hellip;]<\/p>\n","protected":false},"author":26,"featured_media":219214,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[770,658],"tags":[1093,961,2082,521,1247,2083,318,11,2084],"class_list":["post-219273","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analysis","category-graphics","tag-architecture","tag-artificial-intelligence","tag-blackwell-en","tag-dlss","tag-dlss4","tag-geforce-en","tag-geforce-rtx","tag-gpu","tag-nvidia-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Blackwell: GeForce RTX 5000 architecture and innovations [Analysis] - HWCooling.net<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis] - HWCooling.net\" \/>\n<meta property=\"og:description\" content=\"Nvidia&#8217;s new graphics cards \u2013 the GeForce RTX 5090 and RTX 5080 \u2013 won&#8217;t be out until the 30th, but NDA is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, its [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/\" \/>\n<meta property=\"og:site_name\" content=\"HWCooling.net\" \/>\n<meta property=\"article:published_time\" content=\"2025-01-28T20:10:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-28T21:55:34+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka-630px.jpg\" \/>\n<meta name=\"author\" content=\"Jan Ol\u0161an\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jan Ol\u0161an\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"30 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/\",\"url\":\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/\",\"name\":\"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis] - HWCooling.net\",\"isPartOf\":{\"@id\":\"https:\/\/www.hwcooling.net\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka.jpg\",\"datePublished\":\"2025-01-28T20:10:49+00:00\",\"dateModified\":\"2025-01-28T21:55:34+00:00\",\"author\":{\"@id\":\"https:\/\/www.hwcooling.net\/#\/schema\/person\/1a1c9f238b83289e7c87b6fc07ad20ee\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#primaryimage\",\"url\":\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka.jpg\",\"contentUrl\":\"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka.jpg\",\"width\":2304,\"height\":1530},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Dom\u016f\",\"item\":\"https:\/\/www.hwcooling.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis]\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.hwcooling.net\/#website\",\"url\":\"https:\/\/www.hwcooling.net\/\",\"name\":\"HWCooling.net\",\"description\":\"Performance can have many forms...\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.hwcooling.net\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.hwcooling.net\/#\/schema\/person\/1a1c9f238b83289e7c87b6fc07ad20ee\",\"name\":\"Jan Ol\u0161an\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.hwcooling.net\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/2bd32e7382bccc62b0ad57f0b361009d6d4624e68c8f39fb28935312b6d9b5bc?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/2bd32e7382bccc62b0ad57f0b361009d6d4624e68c8f39fb28935312b6d9b5bc?s=96&d=mm&r=g\",\"caption\":\"Jan Ol\u0161an\"},\"url\":\"https:\/\/www.hwcooling.net\/en\/author\/jano\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis] - HWCooling.net","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/","og_locale":"en_US","og_type":"article","og_title":"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis] - HWCooling.net","og_description":"Nvidia&#8217;s new graphics cards \u2013 the GeForce RTX 5090 and RTX 5080 \u2013 won&#8217;t be out until the 30th, but NDA is over and the first reviews of the top-of-the-line RTX 5090, which we also tested, are out. In this article, we take a look at the Blackwell architecture that powers these new GPUs, its [&hellip;]","og_url":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/","og_site_name":"HWCooling.net","article_published_time":"2025-01-28T20:10:49+00:00","article_modified_time":"2025-01-28T21:55:34+00:00","og_image":[{"url":"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka-630px.jpg","type":"","width":"","height":""}],"author":"Jan Ol\u0161an","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jan Ol\u0161an","Est. reading time":"30 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/","url":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/","name":"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis] - HWCooling.net","isPartOf":{"@id":"https:\/\/www.hwcooling.net\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#primaryimage"},"image":{"@id":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#primaryimage"},"thumbnailUrl":"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka.jpg","datePublished":"2025-01-28T20:10:49+00:00","dateModified":"2025-01-28T21:55:34+00:00","author":{"@id":"https:\/\/www.hwcooling.net\/#\/schema\/person\/1a1c9f238b83289e7c87b6fc07ad20ee"},"breadcrumb":{"@id":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#primaryimage","url":"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka.jpg","contentUrl":"https:\/\/www.hwcooling.net\/wp-content\/uploads\/2025\/01\/Grafick\u00e1-architektura-Nvidia-Blackwell-v-GeForce-RTX-5000-upoutavka.jpg","width":2304,"height":1530},{"@type":"BreadcrumbList","@id":"https:\/\/www.hwcooling.net\/en\/blackwell-geforce-rtx-5000-architecture-and-innovations-analysis\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Dom\u016f","item":"https:\/\/www.hwcooling.net\/"},{"@type":"ListItem","position":2,"name":"Blackwell: GeForce RTX 5000 architecture and innovations [Analysis]"}]},{"@type":"WebSite","@id":"https:\/\/www.hwcooling.net\/#website","url":"https:\/\/www.hwcooling.net\/","name":"HWCooling.net","description":"Performance can have many forms...","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.hwcooling.net\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.hwcooling.net\/#\/schema\/person\/1a1c9f238b83289e7c87b6fc07ad20ee","name":"Jan Ol\u0161an","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hwcooling.net\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/2bd32e7382bccc62b0ad57f0b361009d6d4624e68c8f39fb28935312b6d9b5bc?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2bd32e7382bccc62b0ad57f0b361009d6d4624e68c8f39fb28935312b6d9b5bc?s=96&d=mm&r=g","caption":"Jan Ol\u0161an"},"url":"https:\/\/www.hwcooling.net\/en\/author\/jano\/"}]}},"_links":{"self":[{"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/posts\/219273","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/users\/26"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/comments?post=219273"}],"version-history":[{"count":3,"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/posts\/219273\/revisions"}],"predecessor-version":[{"id":219286,"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/posts\/219273\/revisions\/219286"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/media\/219214"}],"wp:attachment":[{"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/media?parent=219273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/categories?post=219273"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hwcooling.net\/en\/wp-json\/wp\/v2\/tags?post=219273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}