Ampere deep dive: what’s new in GeForce RTX 3000 architecture

PCI Express 4.0, HDMI 2.1, AV1, 8K video and 8K (upscaled) gaming

In terms of hardware, September was a green month with the release of the new generation of Nvidia GPUs, GeForce RTX 3000. They are based on the new Ampere architecture. In this article we are going to discuss what’s new compared to Turing: the new SM architecture doubling the number of shaders, the manufacturing process and the characteristics of the two chips that have been unveiled so far.

PCI Express 4.0 and RTX IO

The Ampere GPUs are the first from Nvidia to use PCI Express 4.0 (or almost – Nvidia released the GeForce MX450 for laptops a few weeks ago, which has a Turing TU117 chip in a new revision, which surprisingly supports PCI Express 4.0 ×4). PCI Express 4.0 delivers twice as much bandwidth to the graphics card when communicating with the system, with one line transferring 2 GB/s instead of 1 GB/s with PCI Express 3.0. Loading textures or other data is twice as fast with PCIe 4.0.

Ampere is not yet fast enough to be significantly limited by PCI Express 3.0 bandwidth. According to techPowerUp’s testing, you will mostly lose just about 1% of performance in games if you insert the card into a PCIe 3.0 ×16 slot instead of PCIe 4.0 ×16. This becomes a bit bigger concern if you only give the card eight PCIe 3.0 lines, but the drop will still be just a few percents.

PCIe 4.0 bandwidth is to be exploited for new mode of direct reading of compressed textures from NVMe SSD, using DMA transfer which avoids without going through the CPU and RAM. This technique is called DirectStorage, Nvidia calls it RTX IO.

With this technology, textures are decompressed directly on the GPU using shaders, which, according to Nvidia, allows to extract a significantly larger amount of data, that would otherwise take up many cores if this was performed on CPU. For example, the GPU freed up to 24 CPU cores in the demo demonstrated during Ampere’s announcement event, but this might be a bit of an corner case. The RTX IO/DirectStorage implementation is not a fully hardware feature by the way. The decompression runs on general-purpose compute units (shaders). Thanks to this, Nvidia is going to add this capability to the Turing generation graphic cards with a driver update.

HDMI 2.1 and 8K support

The Ampere GPUs are the first standalone graphics cards to support HDMI 2.1 output, with a maximum bandwith of 48 Gb/s. As a result, connected TVs will be ale to handle resolutions of up to 8K at 60 frames per second or 4K at 240 frames per second. This also includes HDR picture.

However note that this is not an uncompressed full quality. These resolutions will require the use of lossy DSC 1.2a compression and also the conversion to subsampled YUC 4:2:0 colorspace (where the brightness component has a resolution of 8K, but the color/chrominance places only have 4K resolution).

There are no changes for DisplayPort output, here Nvidia only supports DP 1.4a. It is probably still too early for DisplayPort 2.0 to appear. The graphics cards can use HDCP 2.3 copy protection on both DisplayPort and HDMI.

Support of video in AV1 format

In addition to 8K display driving, the Ampere GPU can also work with video of this resolution (8K is 7680 × 4320 pixels). Ampere has a new decoding block that supports, in addition to the classic H.264 and H.265 (HEVC) and VP9 formats also the new AV1 format. This novelty is also introduced by Intel’s Tiger Lake and it looks like it will also be supported in the Radeon RX 6000.

The GPU should be able to play back AV1 video in profile 0 and level 6.0, which means in 8 and 10 bit color depth, but only with 4:2:0 color sampling, not with 4:2:2 or 4:4:4 coding (4:4:4 is supported for HEVC). 8K playback at 60 frames per second is supposed to be supported.

AV1 video playback support in Ampere GPU

On the other hand, video compression (encoding) into AV1 is not yet possible. Ampere only contains new video decoders, not encoders. The NVENC hardware encoder was more or less taken over from the previous generation of Turing and supports compression of up to HEVC and VP9. However, Intel doesn’t have an AV1 encoder yet either. It makes sense to add just playback support at first, which takes less work.

Capabilities of hardware decoders and video encoders in Ampere GPU

8K gaming (using DLSS)

Especially with the GeForce RTX 3090 (but it should be similarly possible with the RTX 3080, which isn’t too far behind in performance) Nvidia also announced that thanks to Ampere, it is now a possibility to game in the resolution of 8K/7680 × 4320 pixels, which has 4 times more pixels than 4K. However, this does not mean native rendering in 8K, Nvidia is only playing about upscaling a lower resolution rendering to 8K display resolution.

Nvidia uses its DLSS upscaling for this, which now  supports 8K output in the new version (DLSS 2.1). This uses the newly added Ultra Performance mode, which has even higher speed and lower quality than the previously fastest “Performance” mode. The original Performance setting performs upscaling with a magnification factor of 2× in both dimensions, so for 4K output, the image is actually rendered on the GPU only at resolution of 1920 × 1080 pixels and then upscaled to 3840 × 2160 pixels using tensor cores.

Scheme of Nvidia DLSS 2.0 upscaling operation with temporal stability

DLSS with 8K resolution (Ultra Performance) works with an even larger upscaling factor of 3× in both dimensions. This means that when gaming in 8K, the graphics actually only renders image at the resolution of just 2560 × 1440 pixels, all the rest is upscaling. This probably means stronger blurring and artifacts. Thanks to such a low native resolution, the games should clearly achieve quite playable FPS, but of course, the image quality will be far from native 8K rendering.

New architecture, but building upon the innovations of Turing and Volta

So that’s it for the Ampere GPU architecture. In general, we can perhaps say that it is not as an innovative jump as Turing and Volta were, in which RT cores and tensor cores first appeared (the Volta GPU had tensor cores before Turing, but only for servers). Ampere is more about increasing the performance of these new units. An exception to this, however, is the doubling of the number of shaders, which was very well kept secret and provided a big surprise at the launch. Even this feature, however, is kind of finishing a work that was already a been started in the Turing and Volta architectures. But Ampere significantly improves it.

What happens is that Ampere will in all its aspects (shaders, memory, RT cores, tensor cores) probably reach significantly higher performance, so the novelties from the previous generation will now be better useble in games. And also, there will be more games with ray tracing effects now compared to years 2018/2019.

PCB of GeForce RTX 3080 Founders Edition with Ampere GA102 GPU

Nvidia has done a good job on this architecture. There’s probably just a single flaw, that being the high performance is accompanied by an increase in power consumption, whether this is the fault of the 8nm process, or whether Nvidia simply decided to tap this source to make the cards faster. A significant part of the performance boost the RTX 3000 cards bring to the table is thus achieved by increasing power consumption, while efficiency has not increased as much, at least in RTX 3080 and RTX 3090 (RTX 3070 is more efficient). However, this may be temporary, it is possible that in the next evolution fo Ampere made on 7nm or 5nm process, Nvidia will return to lower power consumption and higher efficiency.

Performance of the individual GeForce RTX 3000 cards is something that we’ll leave for actual reviews.

English translation and edit by Lukáš Terényi


Flattr this!

Leave a Reply

Your email address will not be published. Required fields are marked *