Nvidia's GeForce RTX 3090 is one of the best graphics cards that money can buy. However, multiple GeForce RTX 3090 graphics cards bit the dust while playing Amazon's New World game. EVGA has launched an investigation into the dead graphics cards and has shared the results with PCWorld.
GeForce RTX 3090 owners went into panic mode when a plethora of user reports claimed that New World was killing graphics cards left and right. Apparently, Amazon Games didn't implement a frame rate limiter in the main menu, which caused graphics cards to malfunction or die prematurely due to the high frame rates. There were alleged reports of Radeon and other Ampere-based SKUs suffering from the same problem, which isn't exclusive to the GeForce RTX 3090.
Amazon Games has since added a frame rate limiter to New World, and there hasn't been any new reports of precipitated deaths. EVGA shipped out replacements to the affected users and collected the bricked graphics cards for X-ray analysis.
Initially, many speculated that the graphics card's fan controller was the culprit for the premature failures. However, an EVGA spokesman has dispelled that theory. According to EVGA, the micro-controller may appear to be not working correctly due to the related noise on the i2c bus. This can cause third-party software, including HWiNFO or GPU-Z erroneously report that the fan controller wasn't working properly. EVGA's in-house Precision X1 software didn't have this problem. Nevertheless, EVGA has released a micro-controller update that will show the fan controller's correct operation on updated versions of the aforementioned third-party tools.
After analyzing the 24 deceased GeForce RTX 3090 graphics cards, the company discovered that real issue was due to "poor workmanship." Apparently, the soldering around the graphics card's MOSFET circuits leaves much to desire.
EVGA claimed that the soldering problem only affects a handful of GeForce RTX 3090 graphics cards that were part of the early production run in 2020. Although EVGA didn't reveal concrete numbers, the company affirmed that the affected batch is less than 1% of all the graphics cards that it has sold.
Will be interesting to see if anyone saved their dead 3090, or was a preorderer and has one of the same early batch, and can back up or debunk EVGA's explanation.
Theoretically couldn't the same circumstances be replicated using, say, Aquamark? That is, a program which isn't a power virus to trigger limiters, has no frame rate cap, and is graphically weak enough in 2021 for a card of the power of the 3090 to generate hundreds or thousands of FPS?
Come on, now. People expect these things to just work... with no caveats.
They already had the ACX/ICX(?) from older models that would burn up.
There's the fuse popping, followed by ded gpu from their recent FTW3s - might not have been exclusive to FTW3, I haven't really been keeping track.
I expect to read/see comments from the other partners that they too, had soldering issues with their 30 series gpus...
"Although EVGA didn't reveal concrete numbers, the company affirmed that the affected batch is less than 1% of all the graphics cards that it has sold."
They sell more gpus than the other partners by far...
I don't see the need for amazon to do that. Properly designed and manufactured hardware should not fail even under full load. Furthermore, all the GPUs have proper monitoring sensors which ensure that parameters like temp, power etc.. do not go over specified limits. Unless you are telling me the game override these parameters, there is no way a software can cause a hardware to perform beyond its design limits (I wish it can, this means we can get 3070 to perform like 3080).
Although EVGA did not specifically say whats wrong with the soldering, its easy to guess. Repeated expansion/contraction (due to thermal cycling) causes fatigue to the solder, so it cracks after sometime.
We had a problem recently where our suppliers moved manufacturing to another facility. Months later one of the parts began failing. Analysis of the failing component indicated that the failure was due to beryllium contamination in the water used to clean the boards after soldering, not the component.
It probably wasn't a result of thermal cycling though, seeing as these cards should all be fairly new, and the failures tended to happen within minutes of first launching the game. There were suggestions that the cards might have been pulling power beyond their intended limits, and the weak solder joints may have just been not able to cope, rather than being something resulting from long-term repeated temperature changes. And if if the cards were actually drawing more power than they should have been, the solder itself might not have even been the root cause of the issue, but rather just the point of failure resulting from some other design issue.