Why want a GeForce RTX 4000? Nvidia DLSS 3.5 technology analysis

Frame Generation. What is it about?

In this article, we’ll take a look at some of the exclusive technologies that Nvidia GeForce RTX 4000 generation graphics cards can provide. We are going to explain the most significant new features currently supported by the GeForce graphics card ecosystem and perform tests showing how they affect performance in Cyberpunk 2077 with the new Phantom Liberty expansion. And we’ll also take a look at what they’re doing to image quality.

Disclaimer: This article was commissioned and paid for by Nvidia. However, the company did not interfere with its content in any way and the only requirement was to present the benefits of DLSS 3.5 technology to users. The reported results from Chapter 5 onwards are our own.

First, a bit on the theory behind how these technologies work.

Frame Generation (also know as DLSS 3)

The GeForce RX 4000 graphics card generation brought DLSS 3, a new extension of the AI upscaling (also known as super resolution technique) and neural network image enhancement tool, previously known as DLSS 2.x, as an exclusive new feature. DLSS 2.x is, in short, a combination of spatial upscaling using a neural network and temporal reconstruction, which, by using motion compensation based on motion vectors obtained from the game engine and other data, can temporally “combine” information present in several consecutive frames and add it to the current frame, where it might not be properly visible otherwise due to low resolution during rendering.

DLSS 2.x improves in-game performance by rendering the game at a reduced internal resolution instead of target monitor resolution (thus significantly improving FPS) and then performing the upscaling described above to get to the target resolution. This can have a good effect when augmented by the temporal “reconstruction” or “enhancement” of detail, which can push the quality significantly higher than what is possible with ordinary upscaling that would be working with just a single frame. According to Nvidia, the quality should be very close to native resolution (in some specific cases it may even be better, as DLSS 2.x does both stabilization and anti-aliasing, so it can remove shimmering that the game may generate by default).

The principles of how DLSS 2.0 works (source: Nvidia)

What Nvidia is doing with DLSS 2.x is generating more pixels for the screen than the game actually renders on the GPU. DLSS 3, and then the follow-up DLSS 3.5, includes this technology, but also adds a new feature that takes this “pixel multiplication” to another dimension – while upscaling increases the number of pixels (resolution) within a single frame (albeit this is done by temporally analyzing other frames for these purposes), DLSS 3 adds a feature called Frame Generation, which “adds” completely new frames to the set of existing frames that the game itself has rendered. In the current form of the technology, the frame rate is more or less doubled, with the frame generation function inserting one artificial intermediate frame between every two existing frames.

While DLSS 2.x increases FPS by reducing the performance required to render a single frame, DLSS 3/DLSS 3.5 simply adds more frames after the game has done its work. So the FPS increase here is of a different nature. With ordinary upscaling, each frame is a product of the game engine, which has calculated for it how all the effects should look (including, for example, lighting, enemy AI, various flashes and similar more complicated changes between frames), but also how certain objects should move (flying projectiles), and it takes into account information about what the players are doing.

Doubling FPS with Optical Flow

Frame generation in DLSS 3/3.5 is independent of the game engine, it is actually an external post-processing. It is solely based on the pixels of frames produced by the game (which may already employ upscaling with DLSS 2.x, i.e. with temporal reconstruction). Then the actual frame generation is applied. This uses the optical flow technique and generates a new frame by comparing two consecutive frames (more accurately, it can use more than one of the past frames, but only one future frame is analysed). And from their data, it estimates what the intermediate frame between them should look like based on what it knows about the movement of the objects.

The principles of how DLSS 3.0 works (source: Nvidia)

This operation is performed by a special AI (Convolutional Autoencoder neural network) trained on images with high 16K resolution. During its learning, it compared its own images estimated by the optical flow technique with a reference real image and improved by learning to get closer to the reference image over time. As with DLSS 2.x, motion vectors obtained as metadata from the game engine are used, but they are not the only source of motion information.

Exclusively available to GeForce RTX 4000 GPUs

An important component in making this neural network work is acceleration on specialized units – the Optical Flow Accelerators present in the GeForce RTX 4000 generation GPUs are used, which are provided by the Ada Lovelace architecture. Ada includes a new and more capable generation of these units (versus previous generations that have an older version of the accelerator), on which the frame generation within DLSS 3/3.5 is based. Therefore, this feature cannot be used on other generations of GPUs and this makes the ability to use frame generation one of the reasons to buy these graphics cards.

Optical Flow Accelerators provide the AI that generates the frames with information about how objects move and other changes occur in the frames. Optical Flow analysis independently analyses the frames and looks for motion vectors, a second separate source of motion information used alongside the motion vectors obtained from the game. This is because using these two separate sources can prevent some errors when the motion vectors obtained from the game engine are not completely correct.

DLSS 3 combines in-game vectors with vectors found by Optical Flow to generate frames for better quality (source: Nvidia)

Motion smoothness versus reactivity and latency

The generated intermediate frame approximates where the moving objects should be and what they should look like in the to-be-represented intermediate state between the two real frames. For more complex changes, like irregular motion or lighting effects, for example, such an estimate may be less reliable and in some cases the corresponding visual information required for the interpolation may not be found, leaving a hole in the frame that must be masked, for example, by inpainting in from neighbour known pixels.

In general, these generated intermediate frames have the advantage of making game movement smoother, which will be especially important if you are starting with a low frame rate. However, because they are independent of the actual internal processes of the game, they are not fully fledged in terms of interactivity and keeping the player fully attuned to what is happening in the game. The generated frames are not tied to the game action, because DLSS 3 only works with the image data and motion information of each frame as input – the game and player input or actions have no way to directly affect the content of the interpolated frames. This creates inaccuracy because the motion and events represented in the artificial intermediate frame may not exactly match game’s “reality”.

This method of increasing frame rate may therefore not be very suitable for fast-paced competitive games and eSports games. In those, you don’t want the movement to be nice and smooth, but to see your opponents’ actions as soon as possible, but also as accurately as possible, and be able to react as quickly as possible.

In addition, the disadvantage of the generated frames in general is that they may contain artifacts, distortions due to the inevitable imperfection of Optical Flow motion prediction and in general the fact that sometimes required visual data may be missing (the problem may arise, for example, from the occlusion of objects and background). This can be masked to a degree in motion because the worse interpolated images blend together with good quality “legitimate” ones, but overall the frame rate doubled by this kind of frame generation is not something on the same qualitative level as when you get comparable FPS value with all frames actually rendered by the game.

When is it beneficial: CPU limitation and smooth movement

Doubling frames this way “outside the game” can be beneficial when you are limited by GPU performance, but it can be quite crucial in games that are limited by CPU performance (for example, due to AI computation of enemy behaviour or physics). This is where the fact that these new (albeit artificial) frames are created “without consulting” the game comes into play. However, because of this, in a situation where the CPU is already running at maximum and the game cannot generate a higher frame rate, DLSS 3 (or DLSS 3.5) can still double the FPS with frame generation and can greatly improve the perceived smoothness (perhaps the most notable current example of this would be Microsoft Flight Simulator). Note that DLSS 2.x wouldn’t help in this situation – it would only reduce the hardware rendering load on the GPU, but since the bottleneck in this situation is purely frame preparation on the CPU, the resulting FPS would not improve. But DLSS 3/DLSS 3.5 doesn’t require any extra work from the game and its CPU-limited code, and that can “invent” additional frames on its own, so it’s not bound by this like DLSS 2.x is.

Downside: Increased game latency

However, frame generation worsens the latency of the game due to the fact that a new frame is generated from and inserted between the last two consecutive frames. If you are inserting one artificial frame between two real frames, you need to delay displaying the last real frame – the GPU driver needs to save it when it is done and produce the interpolated frame first and display that one. It must then wait the same amount of time as between the second to last real frame and the artificially generated frame before the previously rendered game-supplied frame can be finally displayed. DLSS 3 therefore needs to have a queue of frames processed ahead of time and this increases latency. All game reactions to your inputs or to your opponents’ actions are displayed on your monitor a little later than without this feature.

⠀

Continue: Nvidia Reflex: Principle of operation

Flattr this!

DLSS DLSS 3 DLSS 3.0 DLSS 3.5 GeForce GeForce RTX Nvidia

Michał Chojnowski on Noctua NH-D15 G2 – Meeting all our expectations?About regrets, coolers seeming lacking presence in retrotest, there will be always some coming to...
Ľubomír Samák on Noctua NH-D15 G2 – Meeting all our expectations?:) I guess the fan tests will have to come slower than you'd like. Those...
Ľubomír Samák on Noctua NH-D15 G2 – Meeting all our expectations?It was you who reminded me of the older test. I first read your comment,...
Bufo on Noctua NH-D15 G2 – Meeting all our expectations?Lubo, take into account that there are only 24 hours per day on this planet....
Michał Chojnowski on Noctua NH-D15 G2 – Meeting all our expectations?Of course deal and take your time ;) I know well your linked test of...
Ľubomír Samák on Noctua NH-D15 G2 – Meeting all our expectations?I will definitely be testing the NH-D14 cooler on the new platforms. I still remember...
Ľubomír Samák on Noctua NH-D15 G2 – Meeting all our expectations?This is a very good idea and we will compare NH-D14 with NH-D15 G1 and...
Michał Chojnowski on Noctua NH-D15 G2 – Meeting all our expectations?Thank you for the chart, appreciate that and even not looking at hwcooling results, just...
Bufo on Noctua NH-D15 G2 – Meeting all our expectations?Quite frankly Michal, there are more pressing matters that Ľubo or Pavel need to do...

Why want a GeForce RTX 4000? Nvidia DLSS 3.5 technology analysis