Why want a GeForce RTX 4000? Nvidia DLSS 3.5 technology analysis

Jan Olšan

7 months ago

Ray Reconstruction: The new feature brought by DLSS 3.5

In this article, we’ll take a look at some of the exclusive technologies that Nvidia GeForce RTX 4000 generation graphics cards can provide. We are going to explain the most significant new features currently supported by the GeForce graphics card ecosystem and perform tests showing how they affect performance in Cyberpunk 2077 with the new Phantom Liberty expansion. And we’ll also take a look at what they’re doing to image quality.

Disclaimer: This article was commissioned and paid for by Nvidia. However, the company did not interfere with its content in any way and the only requirement was to present the benefits of DLSS 3.5 technology to users. The reported results from Chapter 5 onwards are our own.

First, a bit on the theory behind how these technologies work.

Frame Generation (also know as DLSS 3)

The GeForce RX 4000 graphics card generation brought DLSS 3, a new extension of the AI upscaling (also known as super resolution technique) and neural network image enhancement tool, previously known as DLSS 2.x, as an exclusive new feature. DLSS 2.x is, in short, a combination of spatial upscaling using a neural network and temporal reconstruction, which, by using motion compensation based on motion vectors obtained from the game engine and other data, can temporally “combine” information present in several consecutive frames and add it to the current frame, where it might not be properly visible otherwise due to low resolution during rendering.

DLSS 2.x improves in-game performance by rendering the game at a reduced internal resolution instead of target monitor resolution (thus significantly improving FPS) and then performing the upscaling described above to get to the target resolution. This can have a good effect when augmented by the temporal “reconstruction” or “enhancement” of detail, which can push the quality significantly higher than what is possible with ordinary upscaling that would be working with just a single frame. According to Nvidia, the quality should be very close to native resolution (in some specific cases it may even be better, as DLSS 2.x does both stabilization and anti-aliasing, so it can remove shimmering that the game may generate by default).

The principles of how DLSS 2.0 works (source: Nvidia)

What Nvidia is doing with DLSS 2.x is generating more pixels for the screen than the game actually renders on the GPU. DLSS 3, and then the follow-up DLSS 3.5, includes this technology, but also adds a new feature that takes this “pixel multiplication” to another dimension – while upscaling increases the number of pixels (resolution) within a single frame (albeit this is done by temporally analyzing other frames for these purposes), DLSS 3 adds a feature called Frame Generation, which “adds” completely new frames to the set of existing frames that the game itself has rendered. In the current form of the technology, the frame rate is more or less doubled, with the frame generation function inserting one artificial intermediate frame between every two existing frames.

While DLSS 2.x increases FPS by reducing the performance required to render a single frame, DLSS 3/DLSS 3.5 simply adds more frames after the game has done its work. So the FPS increase here is of a different nature. With ordinary upscaling, each frame is a product of the game engine, which has calculated for it how all the effects should look (including, for example, lighting, enemy AI, various flashes and similar more complicated changes between frames), but also how certain objects should move (flying projectiles), and it takes into account information about what the players are doing.

Doubling FPS with Optical Flow

Frame generation in DLSS 3/3.5 is independent of the game engine, it is actually an external post-processing. It is solely based on the pixels of frames produced by the game (which may already employ upscaling with DLSS 2.x, i.e. with temporal reconstruction). Then the actual frame generation is applied. This uses the optical flow technique and generates a new frame by comparing two consecutive frames (more accurately, it can use more than one of the past frames, but only one future frame is analysed). And from their data, it estimates what the intermediate frame between them should look like based on what it knows about the movement of the objects.

The principles of how DLSS 3.0 works (source: Nvidia)

This operation is performed by a special AI (Convolutional Autoencoder neural network) trained on images with high 16K resolution. During its learning, it compared its own images estimated by the optical flow technique with a reference real image and improved by learning to get closer to the reference image over time. As with DLSS 2.x, motion vectors obtained as metadata from the game engine are used, but they are not the only source of motion information.

Exclusively available to GeForce RTX 4000 GPUs

An important component in making this neural network work is acceleration on specialized units – the Optical Flow Accelerators present in the GeForce RTX 4000 generation GPUs are used, which are provided by the Ada Lovelace architecture. Ada includes a new and more capable generation of these units (versus previous generations that have an older version of the accelerator), on which the frame generation within DLSS 3/3.5 is based. Therefore, this feature cannot be used on other generations of GPUs and this makes the ability to use frame generation one of the reasons to buy these graphics cards.

Optical Flow Accelerators provide the AI that generates the frames with information about how objects move and other changes occur in the frames. Optical Flow analysis independently analyses the frames and looks for motion vectors, a second separate source of motion information used alongside the motion vectors obtained from the game. This is because using these two separate sources can prevent some errors when the motion vectors obtained from the game engine are not completely correct.

DLSS 3 combines in-game vectors with vectors found by Optical Flow to generate frames for better quality (source: Nvidia)

Motion smoothness versus reactivity and latency

The generated intermediate frame approximates where the moving objects should be and what they should look like in the to-be-represented intermediate state between the two real frames. For more complex changes, like irregular motion or lighting effects, for example, such an estimate may be less reliable and in some cases the corresponding visual information required for the interpolation may not be found, leaving a hole in the frame that must be masked, for example, by inpainting in from neighbour known pixels.

In general, these generated intermediate frames have the advantage of making game movement smoother, which will be especially important if you are starting with a low frame rate. However, because they are independent of the actual internal processes of the game, they are not fully fledged in terms of interactivity and keeping the player fully attuned to what is happening in the game. The generated frames are not tied to the game action, because DLSS 3 only works with the image data and motion information of each frame as input – the game and player input or actions have no way to directly affect the content of the interpolated frames. This creates inaccuracy because the motion and events represented in the artificial intermediate frame may not exactly match game’s “reality”.

This method of increasing frame rate may therefore not be very suitable for fast-paced competitive games and eSports games. In those, you don’t want the movement to be nice and smooth, but to see your opponents’ actions as soon as possible, but also as accurately as possible, and be able to react as quickly as possible.

In addition, the disadvantage of the generated frames in general is that they may contain artifacts, distortions due to the inevitable imperfection of Optical Flow motion prediction and in general the fact that sometimes required visual data may be missing (the problem may arise, for example, from the occlusion of objects and background). This can be masked to a degree in motion because the worse interpolated images blend together with good quality “legitimate” ones, but overall the frame rate doubled by this kind of frame generation is not something on the same qualitative level as when you get comparable FPS value with all frames actually rendered by the game.

When is it beneficial: CPU limitation and smooth movement

Doubling frames this way “outside the game” can be beneficial when you are limited by GPU performance, but it can be quite crucial in games that are limited by CPU performance (for example, due to AI computation of enemy behaviour or physics). This is where the fact that these new (albeit artificial) frames are created “without consulting” the game comes into play. However, because of this, in a situation where the CPU is already running at maximum and the game cannot generate a higher frame rate, DLSS 3 (or DLSS 3.5) can still double the FPS with frame generation and can greatly improve the perceived smoothness (perhaps the most notable current example of this would be Microsoft Flight Simulator). Note that DLSS 2.x wouldn’t help in this situation – it would only reduce the hardware rendering load on the GPU, but since the bottleneck in this situation is purely frame preparation on the CPU, the resulting FPS would not improve. But DLSS 3/DLSS 3.5 doesn’t require any extra work from the game and its CPU-limited code, and that can “invent” additional frames on its own, so it’s not bound by this like DLSS 2.x is.

Downside: Increased game latency

However, frame generation worsens the latency of the game due to the fact that a new frame is generated from and inserted between the last two consecutive frames. If you are inserting one artificial frame between two real frames, you need to delay displaying the last real frame – the GPU driver needs to save it when it is done and produce the interpolated frame first and display that one. It must then wait the same amount of time as between the second to last real frame and the artificially generated frame before the previously rendered game-supplied frame can be finally displayed. DLSS 3 therefore needs to have a queue of frames processed ahead of time and this increases latency. All game reactions to your inputs or to your opponents’ actions are displayed on your monitor a little later than without this feature.

⠀

In this article, we’ll take a look at some of the exclusive technologies that Nvidia GeForce RTX 4000 generation graphics cards can provide. We are going to explain the most significant new features currently supported by the GeForce graphics card ecosystem and perform tests showing how they affect performance in Cyberpunk 2077 with the new Phantom Liberty expansion. And we’ll also take a look at what they’re doing to image quality.

Reflex: How to have the fastest reactions

Now is a good time to mention the Nvidia Reflex technology, exclusive to Nvidia GeForce graphics cards, which is important for competitive gaming. Reflex aims at helping with precisely the problem that, for competitive gamers, is the latency between the input and the display, or you can also say between the action in the game (including the actions of the opponents) and their display on the monitor. This latency represents the time that elapses before an event in the game – such as an enemy emerging from behind a wall – is actually reflected on the monitor. Thus, it is the time that is added as a negative disadvantage to the player’s actual reaction time. The lower this latency, the more advantage the player has, because they can start reacting before the opponent. In competitive multiplayer gaming, the lowest possible latency should directly improve a competitor’s potential.

Previous solutions aimed at reducing latency (such as Low Latency Mode) work by eliminating the frame queue in the driver, but Reflex goes beyond the capabilities of these technologies. Reflex is based on a component (Reflex SDK) that developers integrate directly into the game, opening up deeper possibilities for reducing the various responses and delays that occur between input and output.

In addition to the Reflex SDK and its integration into games, Reflex also offers input latency measurement for peripherals such as mice, keyboards, and even monitor latency. This Reflex Latency Analyzer feature works with supported hardware and monitors with G-Sync. For most gamers, the main thing that will probably be relevant is the Reflex SDK and its integration in games, which is not tied to owning specific mice or monitors.

You always have some latency when you are gaming, and many things are the source of that latency at the same time. There’s not much you can do about network connection latency (if you already have the best connection possible or affordable to you), which is the lag or “ping” of packets between you and the game server. Nvidia Reflex tries to mitigate another latency that occurs at your computer, which it calls “System Latency”. This is “contributed” to by the input lag of your mouse, keyboard, and their signals being processed by the operating system at the beginning and then by the delay between when the finished frame from the graphics card is sent to the monitor and when it is actually displayed at the end (the monitor has a certain panel response as well as the input lag of the electronics).

Between these peripheral and monitor latencies then lies an amount of latency that is determined by the game itself and its code, which has to process the input of the player(s), calculate what they will do to the game scene, how all the objects in it will behave. This part takes place mainly on the gaming computer’s processor, and Nvidia calls it “Game Latency”. The game then tells the graphics driver to render a frame of the game that shows the result of this. The time taken by production of this frame on the GPU is then the rendering latency (“Render Latency”).

When your game is running at a high frame rate due to sufficient GPU performance, the duration of frames and the time it takes to process them is shortened by inverse proportionality, so high performance itself reduces game latency and rendering latency. But high FPS alone isn’t all you need, because how the game works can artificially increase latency. Adding Reflex to games in conjunction with GeForce graphics drivers tries to eliminate this as much as possible. The main way is by reducing the various downtimes that can build up in a game with better low-latency synchronization to ensure that frames go to the monitor for rendering as soon as possible.

The first step is to limit the part of the game running on the CPU from generating excess frames for further steps of the rendering running on the GPU (which is called CPU Back Pressure). When this happens, a queue of pending frames is created, but each such worsens latency. Another such queue can then also be created in the part running on the GPU in the so-called “Render Queue”. Reflex aims to make the entire game complex running on the CPU and GPU work in a “just in time” style, i.e., the finished frame submissions from CPU processing arrive at the GPU at about the time the GPU can start working on them (without delay), and then they can again be sent straight to the monitor as quickly as possible.

Low Latency Boost for GPUs

A game with Reflex integrated should therefore regulate the pace at which the CPU part produces new frames if the GPU can’t process them as quickly. This would create the described negative CPU Back Pressure phenomenon and increase the latency of the game. Reflex therefore dynamically controls when frame submissions are fed to the GPU.

At the same time, GPU drivers can increase the GPU clock speed to render images faster, so Reflex also communicates with the GPU clock speed control. This feature is called Low Latency Boost and has the ability to override the normal power management of the GPU, which tries to optimize its clock speed for higher overall power efficiency. Low Latency Boost should be able to dial the clock speeds up temporarily even at the cost of efficiency loss in situations where rendering delays are imminent.

Ironically, this Low Latency Boost can help with games where the GPU load is relatively low, because they are not demanding and are mainly limited by the CPU. In such situations, graphics cards tend to run in power saving modes and at lower clock speeds. But these can lead to higher rendering latencies in competitive games. Low Latency Boost therefore force-activates the highest-performance mode even in such a situation, where the GPU is underutilized, in order to reduce rendering latencies as much as possible. However, this leads to an increase in GPU power draw, where an undemanding game where you expect low power draw instead consumes significant watts (and generates higher GPU temperatures). For example, for laptop gaming, Nvidia therefore makes it possible to turn Low Latency Boost off and still leave Reflex on without it.

A more detailed breakdown of the different phases that a game frame goes through on the CPU and GPU

In short, according to Nvidia, Reflex tries to prevent the CPU from running ahead and accumulating a queue of multiple pending frames (worsening rendering latency) in situations where the game is limited by GPU performance (i.e. in a challenging scene where FPS drops). Conversely, in situations where the limit is on the CPU side, Reflex tries to minimize the time it takes for frames to be rendered by the GPU (rendering latency) and send them to display as fast as possible by keeping the GPU clock speeds high.

Available with GeForce GTX 900 and newer

The Reflex feature only works on Nvidia GeForce graphics cards, but also supports older generations (back to GeForce GTX 900). Low Latency Boost should work best on GeForce RTX 3000 and newer graphics cards that have better ability to increase clock speeds when it is activated.

Improving responsiveness with Reflex should obviously be most important for games with PvP multiplayer and competitive multiplayer in general, and eSports games – including popular titles like Fortnite, Valorant, Apex Legends, Call of Duty.

System Latency with and without Reflex (per Nvidia’s testing)

Reflex and Frame Generation

While at its core, Reflex is most interesting for such competitive players, the technology became important in games where latency would otherwise not be much of a consideration with the advent of DLSS 3 (and later DLSS 3.5), which introduced frame generation.

As stated before, you always have some latency when gaming, it’s created by the input device and the processing of those inputs in the game, the physical calculations and rendering, the input lag of the monitor electronics, and even the pixel response of the panel. All of this adds up to a rather large sum of milliseconds. Frame Generation, as mentioned, has to delay displaying of the last rendered frame (since it will only follow after the artificially generated one is displayed) and thus adds some extra latency itself. It doesn’t have to be that many milliseconds versus all the other factors combined, so it may not always matter significantly, but some deterioration is always occurring there by definition. There has been a recent push to shorten and eliminate these latencies, at least for fast competitive games, while frame generation is going in the opposite direction.

Therefore, DLSS 3/3.5 has integrated Reflex technology when using frame generation, which in turn aims to reduce latency. Reflex cannot remove the inherent latency increase created by frame generation, i.e. the negative effect of having to buffer at least one future frame to interpolate the previous one. What Reflex does do, however, is suppress the other sources of latency (namely, game latency and rendering latency before it is the turn of frame generation).

Reflex’s help is therefore about compensating frame generation’s negative impact. In case you start with a game already optimized for low latency by turning on Reflex, and then you activate the frame generation feature on top of that, your latency will of course go up, because Reflex has already used the sources of latency reduction and can’t do anything more.

DLSS 3.5: Ray Reconstruction

The latest exclusive technology that Nvidia has brought to GeForce RTX graphics cards users is Ray Reconstruction, which first appeared with DLSS 3.5 recently. It should be said that DLSS 3.5 also incorporates DLSS 2.x and the aforementioned Frame Generation, which was new in DLSS 3. However, games using DLSS 3.5 do not necessarily use Frame Generation. While that feature is limited to GeForce RTX 4000 graphics cards due to the use of Optical Flow Accelerators, other DLSS 3.5 components can also be taken advantage off by users of previous generations of GeForce RTX 3000 and RTX 2000 – and that includes the new Ray Reconstruction feature, which works o Turing and Ampere GPUs.

Overview of DLSS versions GPU support matrix

Ray Reconstruction is a new feature that affects both upscaling/super resolution in the sense introduced by DLSS 2.x, and at the same time raytracing effects (i.e. “RTX” or “DXR”) in games. The benefit of this technique should be an improvement in quality, but as a side-effect there is also a possible improvement in performance.

Denoising within ray tracing

As you probably know, for rendering a scene by ray tracing, it is necessary to analyze a large number of light rays impacting and reflecting from objects. The problem in games is that there is not enough performance to calculate as many rays as would be needed. Therefore, only a relatively small number of them are analyzed. You can imagine that instead of a neat complete final picture, you end up with a snapshot that doesn’t provide with a continuous image, but instead only has sparse individual pixels forming a kind of noisy image with gaps between them.

Pipeline for rendering raytracing effects

The game implementations of raytracing in DXR (DirectX Ray Tracing) has been using denoisers (visual noise removal filters) since the beginning to smooth, fill and suppress those discontinuities to make the raw sparse image look normal for use in the game. These denoisers in games today are typically solved by various traditional algorithms, often using multiple algorithms combined. They can use both temporal smoothing techniques (where they combine information from multiple consecutive frames) and spatial smoothing techniques (i.e. smoothing within the single frame bitmap only). These should be similar to algorithms you are familiar with if you have experience with video filtering, but here they run on GPU shaders. However, similar filters can also blur detail or cause artifacts, just like video processing.

Ray Reconstruction using AI

The task of denoising is one of those for which the principles of pre-trained neural networks work well, and the Ray Reconstruction technology in DLSS 3.5 provides a special neural network to be used at this point in the rendering of the game, replacing the work of the usual conventional denoisers. The neural network is trained on a corpus of clean and noisy images for this purpose, similar to the way it is trained on the original and downscaled images for upscaling. Once trained, it should perform better than traditional denoisers, according to Nvidia.

Therefore, the usage of this AI within DLSS 3.5 may improve image quality, as the denoiser and its temporal function will be able to preserve some extra detail while preventing some artifacts (temporal ghosting or detail blurring) that current denoisers cause or are unable to prevent.

Ray Reconstruction performs denoising using an AI model

This does not necessarily mean that this noise removal will be without its own image artifacts or not prone to loss of detail at all, but there should be less of these than with conventional denoisers, while more detail should be preserved. Ideally, the use of this AI should bring some benefits similar to how DLSS 2.x improved upscaling to this stage of raytracing as well.

AI filter quality should be higher than with traditional algorithms

This AI noise-removal filter is very close to DLSS 2.x in how it operates – it performs both noise removal and essentially upscaling on the raytracing lighting image data. It uses various data from the game engine to enhance the input rendered frames, but in this case it’s not the final scene frames, but the raytracing lighting image data. The filter is temporal and uses motion vectors – it puts together several consecutive past frames to filter temporally, and by doing so it can also restore some detail that would otherwise be lost in the low resolution used in raytracing effects.

DLSS 3.5 with Ray Reconstruction can improve quality by preserving the high-frequency portion of the information for later analysis by the DLSS 2.x upscaling component, whereas a traditional denoiser would irreversibly eliminate it

Integration with DLSS 2.x

An important element for this promised improvement in visual quality is that this de-noising AI is linked to the upscaling AI used for DLSS 2.x, it should be a single model that performs both functions. This should be helpful in that the AI has more information to do its work. While with separate operation, it would be the case that DLSS 2.x ends up doing a worse job of upscaling the lighting data because the denoiser before it deleted some details and information from the input, an AI integrated in this way will still be able to use such information as input to its decision making. In particular, it’s the so-called “high-frequency information”, which noise-removing filters (which typically function as a low-pass filter) tend to remove, but for upscaling and temporal reconstruction it can be useful to reconstruct and stabilize details at higher resolution and quality.

Functioning of the entire DLSS 3.5 pipeline with Ray Reconstruction and Frame Generation

Not only a qualitative improvement, but also potentially a plus for performance

The primary benefit should therefore be to improve raytracing effects and the parts of the image that are generated by them, such as lighting, reflections and flares. And it should be improved when ray tracing is used in conjunction with DLSS upscaling (super resolution). However, there can sometimes be a performance improvement because the denoising calculations are offloaded to an AI model running on tensor cores, freeing up GPU shaders whose performance is consumed by traditional denoisers. It seems that, at least in some games, memory requirements may also be reduced somewhat.

What’s Overdrive Mode (Cyberpunk 2077)? A full ray tracing game

While raytracing graphics effects have been common in games since 2018, previous implementations have been limited. Due to the high hardware requirements, it is not or was not possible to render an entire scene using this method (as, for example, the Cinebech test does when drawing a benchmark scene, or similar rendering tools when creating professional visualizations or even entire movie scenes). Raytracing games have so far been rendering via ray tracing only partially – for some objects (reflective surfaces, mirrors), for shadows or lighting effects. It was a hybrid approach adding raytracing effects to a “rasterized” scene.

Pipeline of rendering in normal mode with hybrid ray tracing

However, in the future, games based on complete raytracing of the scene should exist. With so-called Path Tracing, the game should draw the complete scene (i.e. all pixels) through the raytracing method, so that the entire scene should have realistic lighting effects that take into account all light sources in the scene, faithful reflections of light and objects, and physically correct shadows.

The first preview of such a game (as far as big AAA titles go) would be Cyberpunk 2077, which premiered as a game with conventional hybrid rendering with added raytracing effects. This year, however, it received a patch adding a so-called Ray Tracing Overdrive Mode, which is supposed to render the game with full ray tracing.

Pipeline of rendering in Overdrive Mode with full ray tracing

In Overdrive Mode, virtually every light source in the scene (including car lights, neon lights, lamps) and its impact on the scene should be ray traced, with realistic effects and shadows. Ray tracing is also used for global illumination and indirect illumination from various light reflections.

However, Ray Tracing Overdrive Mode is extremely demanding on the hardware and processing of these graphics. The game has been heavily optimized and uses Nvidia-optimized raytracing denoisers, which have a similar function to upscaling (discussed in the section on Ray Reconstruction).

Ray Tracing Overdrive Mode produces very low FPS even with the most powerful GPUs, hence the need for DLSS including frame generation. Results from Nvidia testing

However, on today’s GPUs, playing such a game is more or less only possible thanks to technologies like upscaling/super resolution (DLSS 2.x) and frame generation (DLSS 3/3.5), which can produce higher resolution and smoother frame rate output from the fairly low resolution and frame rate you can get out of a GPU today.

Test components (and environment)…

On the AMD platform (s Ryzen 9 5900X CPU on an MSI MEG X570 Ace motherboard) with Patriot Blackout memory (4×8 GB, 3600 MHz/CL18), we use two GeForce graphics cards alternately. The Gigabyte RTX 4060 Windforce OC 8G represents the cheaper option and the Gigabyte RTX 4090 Gaming OC 24G represents the high perfromance option. The results will show you what each graphics card is good for and how smooth the game is. The drivers in both cases are Nvidia GeForce 537.58 Game Ready. The OCAT application we use to record frame times is version 1.6.3.

Measurements are based on the same strict criteria as our standard graphics card tests. This means in a wind tunnel with a properly controlled intake air temperature within a narrow range of 21–21.3 °C. This is also important to minimize measurement error. For maximum accuracy, all values in the graphs are the result of the arithmetic average of three repeated passes.

… Cyberpunk 2077 Phantom Liberty

Testing is conducted in Cyberpunk 2077 with the Phantom Liberty DLC (patch 2.01) at 1920×1080 px, 2560×1440 and 3840×2160 px resolutions. Both in native resolutions, but also mainly with DLSS applied.

Test platform: custom scene (Little China); API DirectX 12, graphics preset Ray tracing: Overdrive or Ray tracing (depends on the test); extra settings DLSS, DLSS Frame Generation a DLSS Ray Reconstruction (different combinations depending on the chosen test) but mainly with the application of DLSS, Nvidia Reflex Low Latency: on *

* Nvidia Reflex is active for all measurements that are within the results in Chapters 4 and 5. Reflex is an integral part of DLSS Frame Generation (it cannot be turned off when frame generation is activated).

Graphic settings for Ray Tracing: Overdrive

For the sake of clarity, we refer to all tested DLSS setup combinations as “Ray Tracing: Overdrive”, although this is a bit inaccurate. By making some changes to the DLSS settings, it is essentially a “custom” setting. But it still uses the extremely hardware-intensive Path Tracing option. This option is not enabled by default in the “Ray Tracing: Ultra” profile.

Graphic settings Ray Tracing: Ultra

Test results – Ray Tracing: Overdrive

Test results – Ray Tracing: Ultra

Comparison of image quality

Nvidia Image Comparison & Analysis Tool

To visually compare the images, we used the Nvidia Image Comparison & Analysis Tool or ICAT. This allows you to simultaneously load either a pair of images or a pair of videos you’ve saved from a game (or other source), and these can then be compared in two ways.

The first is the side-by-side mode, which is shown in our illustration images. You have one source in one half of the screen, the other in the other, and the panning and zooming is synchronized so you can look at different details side-by-side like this and before/after-analyse a feature you turned on or two different graphics settings.

Nvidia ICAT with side-by-side comparison of two images or videos. For illustration, promotional videos are used where Nvidia showed upscaling of web video with sharpening

The second mode (split-screen) uses a slider, where both sources are placed on top of each other and the slider changes the displayed image from one to the other (one source is to the left of the slider, the other to the right of it). The advantage is that you can better pick up the change if you are looking at the spot as you drag the slider over it. The disadvantage is that you can’t see both versions of a particular location at the same time.

Nvidia ICAT with slider comparison of two images or videos

Image Comparison & Analysis Tool (ICAT) can be downloaded here and is free.

Comparative Images 1: Overdrive Mode / Path Tracing

The visual difference you’ll see with Overdrive Mode (i.e. with full ray tracing of highlights and shadows / path tracing) is really big. Not that there’s any immediately obvious difference like when you have a low pixelated resolution in one case and a more detailed image in the other, it’s often not that obvious. Our first reaction was that shadows and lights simply behave and reflect differently, various parts are lit differently.

Perhaps the commonly seen questioning of the benefits of raytracing graphics can sometimes stem from this – the user sees an obvious difference, but it may not be obvious to them whether the differently presented lighting is automatically better. It’s probably because despite the advanced graphics of CP2077, it’s still a game simulation, and the visuals of these games still have a bit of an “animated” feel to them, so we don’t have as much ability to immediately intuitively discern what is true realism.

CP2077 Phantom Liberty, scene one. Rendered via original hybrid/partial ray tracing (RT Ultra)

CP2077 Phantom Liberty, scene one. Rendered via Path Tracing / Overdrive Mode

However, when using well-done Path Tracing, shadows and highlights should generally be rendered in better accordance with the laws of physics, so basically there are reasons to believe that between these two different renditions, Overdrive Mode is the more faithful and correct option. We have noticed an interesting and perhaps indeed physics-based difference here, for example:

CP2077 Phantom Liberty. Left is Path Tracing / Overdrive Mode; Right is original hybrid / partial ray tracing (RT Ultra)

To the right of the bright screen with the suit guy is a concrete wall, which with Path Tracing is lit in orange like the general street area, which I guess should be correct, while the old hybrid ray tracing lights it in green hue from the fluorescent light in the underpass, although it seems like the light from it shouldn’t be hitting that area at all (it seems to be facing away?). The big difference then is the ceiling above the viewers head, in the screenshot it is the zone above the bright screen. With the old hybrid ray tracing, the concrete here is for some reason lit up, as if it’s wrongly affected by that screen (which is on the other side of it), whereas with path tracing it’s properly dark.

Another place where the visuals seems to speak for Path Tracing might be here:

On the right, in the scene with the hybrid old RT, the balcony face next to the red neon sign is strangely dark, although logically it should be similarly lit as the street surface below – Path Tracing draws this part much brighter and that seems to make sense. The difference may be due to the old hybrid RT not being able to reflect that bright light from the big screen onto this area, whereas Path Tracing manages to do that. Similarly, in that overall brighter area around the neon arrow above the balcony, it seems more natural for the “air around it” to be brighter and not dark like on the right image without path tracing.

CP2077 Phantom Liberty, scene two. Rendered via original hybrid/partial ray tracing (RT Ultra)

CP2077 Phantom Liberty, scene two. Rendered via Path Tracing / Overdrive Mode

The second scene we picked out is a daytime scene and this one was a bit of a surprise for us. It turned out that here the differences are not so clear. Unless we’re making a mistake somewhere, outside of the nightly and darkened scenes, there doesn’t seem to be much difference between Path Tracing/Overdrive Mode and the older hybrid ray tracing.

Actually, it seems that in some zones of the image, the older and inferior ray tracing ironically achieves more detail (the question is, of course, whether this is correct).

But here in the second daytime scene, you can find beneficial differences:

If you look at the right image, the rather sharp day shadow seems to be of uniform intensity everywhere in the version rendered with the old hybrid RT, so the far right wall looks too bright, but conversely to the left the shadow is too dark. In the Overdrive Mode image (left), you can see that the shadow intensifies from left to right, so the darkened right part looks better, but on the left the ledge between the door and the tiled column, for example, doesn’t cast such a brutally black shadow anymore. The Overdrive Mode scene seems a little better for it.

CP2077 Phantom Liberty, scene three. Rendered via original hybrid/partial ray tracing (RT Ultra)

CP2077 Phantom Liberty, scene three. Rendered via Path Tracing / Overdrive Mode

Comparative Images 2: Ray Reconstruction

Now the same scenes, but we look at images we get after activating Ray Reconstruction technology. So now we compare the state when we have Overdrive Mode and Path Tracing enabled in both cases. On the left, Ray Reconstruction is activated on top of it,while on the right we have image without it (which is the same image as in the previous comparison).

CP2077 Phantom Liberty. On the left Ray Reconstruction on, on the right Ray Reconstruction off

Here, the differences (apart from the changing light due to the instability of the scene) are more subtle. The most striking is probably the reflection in the puddle, where parts of the neon sign with smaller lettering ended up very blurry without Ray Reconstruction, and overall the lettering in the puddle is sharper with Ray Reconstruction. It’s not quite a 100% win though, as for example the details (lines) on the reflected balcony bottom seem to have been blurred by the AI denoiser used by Ray Reconstruction rather more than the original conventional denoiser (image filtering is often a you win some, you lose some affair).

CP2077 Phantom Liberty, scene one. Rendered via Path Tracing / Overdrive Mode with Ray Reconstruction enabled

Areas where the AI denoiser used by Ray Reconstruction is better can also be found in daytime scenes. Here in the third scene you can see that without Ray Reconstruction (right image), the part of the scene that is in shadow loses some actually quite distinctive lines on the walls that however are preserved once Ray Reconstruction is used. It’s not a glaring difference, but you can see that in some ways, the AI proves to be more successful here.

CP2077 Phantom Liberty, scene three. On the left Ray Reconstruction on, on the right Ray Reconstruction off

CP2077 Phantom Liberty, scene three. Rendered via Path Tracing / Overdrive Mode with Ray Reconstruction enabled

Conclusion

Ray Reconstruction really works

The pleasant finding is that Ray Reconstruction actually has no negative effect on performance and, on the contrary, helps it a bit due to the simplification of the rendering process. The performance increase with Ray Reconstruction is practically always there, but surprisingly it is higher with the GeForce RTX 4060 (typically 5–8%), whereas with the more powerful GeForce RTX 4090 you only gain more like 1–2% of performance.

For some reason, though, we find that on the GeForce RTX 4090, you’ll only see these FPS improvements with Frame Generation enabled – without this workaround, the high end card doesn’t see any improvement in average FPS in 1080p and 1440p in Overdrive Mode. But it did increase minimum FPS, which can also be counted as an improvement. However, at 4K resolution, when the performance requirement is the highest, FPS start to improve as well and Ray Reconstruction is already having a higher benefit (4–9% FPS improvement) in general even on the RTX 4090.

Almost always it’s a single-digit percentage improvement in performance, but that doesn’t make it insignificant as it’s also a feature that’s supposed to improve image quality. According to these results, it’s thus – at least in Cyberpunk 2077 Phantom Liberty – always beneficial.

Visual bonus

The visual side of Ray Reconstruction also held up. We did notice some places where it looked like there was regression along with improvements elsewhere (that reflection in the puddle), but in other cases Ray Reconstruction leads to better detail in low contrast parts of the scene (in the shadows). The fact that there is not an improvement everywhere with this change of the de-noising filter is something that can happen with denoisng filters. The positives hopefully outweigh this, and to the feature’s credit, it does improve performance a little.

It’s hard to judge how much of a benefit the visual improvements are, which is also true for Overdrive Mode/Path Tracing. In terms of ease/comfort and enjoyment of gameplay, you probably won’t get much out of these changes, but that’s often true of graphical embellishments. For example, we’re not sure about the benefit in the daylight scenes (although the third scene, if not in the second, hints there can still be a benefit in the realism of the shadows).

Overdrive Mode has it tough in that its impact on performance is brutal, while perhaps not being quite as much of a leap as the first partial implementation of ray tracing was. Or at least not always (those daytime scenes, again). I guess you can’t say that these two features are as they say “must have”, but they sure are “nice to have”. So a nice bonus, though not something you can’t survive without.

Frame Generation: Overdrive becomes playable for GeForce RTX 4090

While the generation of extra frames is expected to be a great asset for the final FPS, it is important to remember that these are not full-value frames, but only interpolated “filler” ones, as these frames do not directly reflect the game physics or action (and there is the potential for them having lower quality). Looking purely at the end FPS values achieved, even in Overdrive Mode on the GeForce RTX 4090 this tool manages to achieve roughly double the final frame rate, which roughly corresponds to the principle of generating new intermediate frames at a 1:1 ratio to the frames actually rendered by the game.

However, we see that 4K resolution in Overdrive Mode already deviates from this result and the GeForce RTX 4090 shows a performance (final FPS) increase of only 65–73% with Frame Generation. This may indicate that even this GPU is already over-taxed, and even the Optical Flow calculations and frame generation itself have such an overhead that it reduces the number of real rendered frames per second that the GPU can handle. Thus, after that doubling, the final result is no longer “2×”.

Even at 1080p, the GeForce RTX 4060 is at best barely making it

This hypothesis is supported by the fact that the overall weaker GeForce RTX 4060 graphics card also fails to double FPS in Overdrive Mode. The final FPS improvement for this card is only 60–70% at lower resolutions (1080p and 1440p), similar to the RTX 4090’s situation at 4K. And when the RTX 4060 gets tasked with attempting 4K resolution in Overdrive Mode, it can already only boost final FPS by 34–37%, indicating that the computational cost of generating frames is already such that it sabotages the underlying real-world FPS in a way that the vast majority of the theoretical 2× benefit is lost.

It has to be said that at these higher resolutions, Overdrive Mode is more or less unplayable overall even with frame generation on the GeForce RTX 4060. Even after frame generation, you only get 20.6 FPS in 4K or 38.7 FPS in 1440p. And it’s worth remembering that low frame rates are where the drawbacks or issues of frame generation will show up the most, so the improvement in smoothness may not be great in practice (even latency degradation grows worse the lower the FPS). In general, it’s more recommended to use frame generation as a bonus only if some semblance of a playable frame rate is already achieved even without it.

Whether Frame Generation canmake Overdrive Mode playable on the GeForce RTX 4060 is something that is mostly only up for discussion at 1080p resolution. There, 36.1 FPS was measured without the feature (we’re counting the result with Ray Reconstruction On here as there’s no reason to have it turned off) and we got 60.9 FPS with the generated frames (which implies that the real game framerate has dropped to 30.5 FPS). Whether this will be enough is quite the question, though; 30 FPS is already quite a low framerate (“cinematic” FPS according to Ubisoft), and it’s questionable whether doubling by interpolations is enough help when we are starting this low. I guess it will depend a lot on the demands and needs of the particular gamer. On the other hand, at least for the purposes of checking out the next-gen graphics, it should be enough.

Yet with the RTX 4090 at 4K resolution, all of these tools will produce a 100 FPS end frame rate that should actually be usable without complains at that point, even if half of them are generated frames. However that’s the way it goes, these days there’s a very large performance chasm between the graphics representing the absolute high end and the lower mainstream models available.

English translation and edit by Jozef Dudáš

The partner of this article is the e-shop Smarty.cz, which sells various PC stuff including graphics cards. Just click here. Many thanks for the cooperation!

Continue: Testing methodology