Some RTX 3080 cards reported unstable. Driver fix lowers boost

Game crashes and GPU instability on GeForce RTX 3080 card is probably caused by some filter capacitor configurations on the reference PCB

It looks like the release of Nvidia GeForce RTX 3000 “Ampere” will not be amongst those GPU launches that were free of initial issues at the start. Last week, first owners of GeForce RTX 3080 cards began to report instability of the cards allegedly associated with the GPU Boost. The problems might be due to the capacitors used, but Nvidia also solves them by updating the drivers which could probably affect performance.

Crash to desktop problems have been reported in relatively large numbers and on various forums, including Nvidia. But it is not clear what percentage of cards can be affected because we do not have an estimate of the total number of pieces sold. It turns out that a game running on a GPU crashes when the frequency exceeds a certain limit somewhere above 2.0 GHz of real boost (which in Nvidia graphics is significantly above what is written in the specifications. The troubles are reported in detail by VideoCardz or ComputerBase.

This problem is usually present with non-reference cards and especially those factory overclocked. Although a report of the Founders Edition card has already appeared, however, it has a different PCB (PG133) and it has not yet been confirmed that it is also affected, so in theory it could be a false alarm (some other problem). The fact that there is a problem with non-reference cards with factory OC could indicate that the manufacturers chose too high unstable GPU frequencies, so that would be purely on them. However, this does not seem to be entirely true and the truth rather lies in the power supply system which does not filter enough voltage and thus causes instability of the GPU at a frequency it would otherwise handle.

A temporary solution to these problems was therefore to set a lower GPU clock offset, a reduction of 50-100 MHz eliminated the problem. Of course at the cost of lower performance.

POSCAP versus MLCC

According to the card manufacturers’ reactions, the problem seems to occur in the capacitors on the back of the PG132 PCB (in the area directly below the GPU chip, which is usually not covered by the backplate and can therefore be easily inspected). This is a reference PCB designed by Nvidia, although it is used in cards from various other manufacturers. Its design is not completely defined and the specifications allow different capacitor configurations for NVVDD and MSVDD voltage filtering capacitors. Nvidia’s guidelines order POSCAP or SP-CAP polymer tantalum capacitors (larger black case) or MLCC ceramic multi-layer capacitors which are light and small. These capacitors have different properties, MLCCs are supposed to have better capabilities in some respects, but POSCAPs are probably more reliable (MLCCs are said to be cracking and are less tolerant of higher temperatures).

For GPU and PCB design, however, the chosen design with POSCAP capacitors is probably not suitable, because cards that have more of these capacitors (up to six) seem to have those problems. Conversely, cards where more of these POSCAPs are substituted with MLCC capacitors are better off. It seems that the design may have underestimated the amount of capacitors that are needed if POSCAPs are used. This would probably be captured normally when testing cards and the design would be corrected, but according to leaked information, non-reference card manufacturers had very little time to test. According to the igor’sLAB website, Nvidia allegedly did not provide them with a driver until the end of August, and in a few days after that the cards had to be on their way from Asia to stores around the world for the September 17 release. Manufacturers could therefore actually test the cards in game benchmarks for only a few days. If this is true, Nvidia’s secrecy policy seems to have contributed greatly to the problem.

Left: RTX 3080 with six POSCAPs (probably a pre-production version), replaced by MLCC capacitors only, to the right (production version). Comparison of VideoCardz web

The capacitor changes at Asus and MSI were prior to production

The fact that some of the assemblies allowed by the reference PCB may be weaker and that there is a problem with POSCAP capacitors may be evidenced by the reactions of the card manufacturers. Recently, MSI has changed pictures and visuals of the RTX 3080 Gaming X Trio card on their website, which was one of the instability affected and at the bottom it can be seen that the older card had five POSCAPs, while on the newer version there are only four and the vacant position was filled with MLCC capacitors. Asus, who had all six POSCAP positions on the RTX 3080 ROG Strix and RTX 3080 TUF visualizations, did the same, but now the card renders have changed on the company’s website, and at the bottom you can see that POSCAPs have been replaced by MLCC capacitor systems in all positions.

For both of these companies, the pictures were changed quietly. However, Asus said in a statement that it changed the design before mass production began which would mean that the previous configuration on older renders did not go on sale. We also have an official announcement from MSI that the design of the card has not been changed, it is also the case that this exchange of photos only reflects the changes made before the start of production. Originally, the older working version was probably rendered on the web according to the unfinished version, but then the capacitors were changed. MSI confirmed that only new versions of cards went to the stores.

Production MSI RTX 3080 Gaming X TRIO card. According to MSI, only these modified versions with five POSCAPs and several MLCCs in the sixth capacitor area have been shipped for sale into retail stores and to customers

Evga confirmed the problem with POSCAPs

However, Evga officially acknowledged the capacitor problem and said that for the RTX 3080 FTW3 card, which was configured with six POSCAPs, they placed four POSCAPs plus 20 MLCCs, replacing the other two. This was also allegedly done before the start of the sale. In the reviews, there are still cards with the old configuration in the photos, as these are pre-production samples, but only the corrected versions should go to the stores. Otherwise, it’s the RTX 3080 XC3 card, which has five POSCAPs and 10 MLCCs, and according to the manufacturer, it’s enough (although I feel that some reports may have appeared, so we’ll see if further revisions are needed).

Nvidia releases a driver update as a solution

Nvidia did not release a statement until this week, but a very brief one. It states that they have released a new driver version (456.55) which improves stability, but without mentioning the reported problems with game crashes. Nvidia states that partners can adjust the exact configuration of capacitors, and various combinations of POSCAPs and MLCCs do not necessarily reveal whether the power/filtration is more or less fine.

So at least Nvidia does not yet confirm that hardware changes to the PCBs of the affected cards are needed. Nevertheless, it is possible that this is true if the cards have reference frequencies, while the affected cards will probably typically have some factory overclocking for which a weaker (less tolerant) combination of capacitors may no longer be sufficient. From Nvidia’s point of view, however, this would already be a problem with overclocking, which from their point of view is not necessarily supported.

NVIDIA posted a driver this morning that improves stability. Regarding partner board designs, our partners regularly customize their designs and we work closely with them in the process. The appropriate number of POSCAP vs. MLCC groupings can vary depending on the design and is not necessarily indicative of quality.

Lowering frequencies (and most probably performance, too)

However, the driver itself seems to be targeting this problem, because it seems that the problem disappears at least for some users (we will see if for everyone). It looks like Nvidia has changed the behavior of Boost and with the 456.55 driver, the clock speeds are lower. Brad Chacos from PC World states that with the original driver 456.38, his card equipped with six POSCAPs (pre-production sample of Evga RTX 3080 FTW3, the company then changed the capacitors for production cards) always caused a crash in the game Horizon Zero Dawn. The game’s benchmark worked as long as the card ran at around 2010 MHz. However, in one scene, the GPU clock rose to 2025 MHz and the GPU failed there – probably because the filtration did not maintain a stable voltage – and the game crashed. This happened repeatedly with each run.

The 456.55 driver seems to have changed the behavior of the boost. The frequency for most of the benchmark duration on the reviewer’s card is now only 1980 MHz, and at the critical point that caused the game to crash, it only rises to 1995 MHz. In the main menu screen the frequency rose to original 2010 MHz, but there is probably a lower load.

Therefore, at least for this card, the patch will probably reduce performance. But it’s good to realize that 1980 MHz is still a lot more than what is actually stated in the specifications, (the official GPU Boost is only 1.71 GHz). So here, in fact, Nvidia got its back covered in advance. However, it is probably possible that the reduction of 20 MHz (1%) will be measurable in the achieved FPS. The card after this adjustment will have a slightly lower performance than the reviews from the day of release, which were measured on the 456.38 driver, showed. The difference will be small, but in principle it is a mistake to think about, it would be ideal to retest all reviews and ignore previous results.

Whether this frequency reduction will ensure the stability of all cards where problems have occurred, or whether manufacturers will have to withdraw some already sold models from the market (or accept complaints) is not yet entirely clear, this will probably become clear in the coming days and weeks.

It is possible that for some cards the repair will be partially to change the capacitors, but at the same time it will be necessary to reduce the clock speed with this driver, elsewhere the driver will be sufficient for stability and current statements that the cards are OK and meet specifications/are not affected by capacitors problems, will already assume that you are running them with this patch. For example, MSI officially reports that the card design is fine and the only solution needed is to update the driver. Fixes that Evga, Asus and MSI have applied to the production cards that are on sale may not exclude the need to use this new Boost modifying driver. Anyway, players generally need to update the drivers continuously, so there will be no choice, this boost change will be part of all the following updates.

MSI became aware of reports from customers, reviewers, and system integrators that there may be instability when
GeForce RTX 30 Series graphics cards core clocks exceeded a certain amount. The latest GeForce driver (456.55) includes
fixes for the issue.

Either way, once the issues are resolved, the impact of all this should not be too negative for Ampere graphics – any reduction in performance will probably be tiny and there is no need to change the rating. Nvidia might just have tarnished its reputation a bit, because this problem shows that the release was a bit of a hurry and apparently made a mistake in not leaving more time for testing.

English translation and edit by Lukáš Terényi


  •  
  •  
  •  
Flattr this!

Leave a Reply

Your email address will not be published. Required fields are marked *