EVGA posted a YouTube video several days ago advertising its new ICX cooling for RTX 3000 graphics card. Near the end of the video EVGA showcases a new update to the Precision X1 software, and in one of the slides you can see a heavy core and memory overclock on its RTX 3090 FTW3, and extreme edition of the GeForce RTX 3090. How heavy? The GPU core appears to be running at 2105 MHz, and the GDDR6X memory is maybe clocked at 22 Gbps.
Could this overclock be real or is it just marketing hype? For now we need to treat this news with a little skepticism and await confirmation. We can see earlier in the video (at the 1:28 mark) that the base GPU clock appears to be 1695 MHz, with the GDDR6X memory at 9750 MHz (double data rate, that's 19.5 Gbps), so those should be 'reference' clocks.
The next segment shows the overclocked core and memory speeds, except there are some oddities. The core shows 2105 MHz, but the memory shows 5520 MHz. That would mean either the memory was significantly underclocked (to 11 Gbps), or the multiplier on the RAM changed to 4x. Neither one makes a lot of sense, which makes us wonder if it's a typo.
A 400MHz core overclock for a modern GPU is very big ... but we don't actually know what the typical boost clock is for the RTX 3090. In the past, Nvidia has been quite conservative with boost clocks. The RTX 2080 Ti Founders Edition for example has a boost clock of 1635 MHz, but routinely runs at 1800 MHz or more in games — without overclocking. In other words, a static clock of 2105 MHz might only be 100-200 MHz higher than the GPU normally runs.
It could be that EVGA underclocked the VRAM to give the GPU core more headroom. It could be a typo. It could be a lot of things. We haven't been able to test any RTX 30-series GPUs yet, so we'll have to wait until September 17 to see how the 3080 performs, and then another week to show the RTX 3090 numbers.
Ampere's performance gains could be spectacular for overclockers. Or they might end up being similar to what we've seen with Turing and Pascal. If EVGA is truly hitting a 410 MHz offset, that would be incredible and could mean 15-20% more performance than stock. Assuming the memory bandwidth doesn't end up limiting performance.
The RTX 3090 already looks like it will deliver incredible performance. Nvidia says it's 50% faster than the outgoing RTX 2080 Ti, though the theoretical TFLOPS is actually 165% higher. But memory bandwidth is only 52% higher and may be a limiting factor. Regardless, it's going to be interesting to see how far the average RTX 3090 will overclock once reviewers get their hands on these new GPUs. Stay tuned for our review of the RTX 3090 coming soon.
No typo here , it is a GDDR6X mode.
I live in Canada. I can see that I'm leaving the window open through the winter with these things.
Well, first off it's a new unknown node, so no one knows what to expect.
Second, you're looking at a 3090 with 28 billion transistors, whereas the 2080S had 13.6 billion transistors.
Third, the 2080S has a 250W TDP, while the 3090 is 350W
Fourth, 24GB of VRAM
So basically you have 2.05x the transistor count, 3x the memory (faster/hotter, at that), with 40% more power usage/heat generation despite the node shrink. And on air, still hitting 2105MHz? If it's even true...it would be an incredible card.
Remember...the stock boost clock of the 3090 is just 1700MHz. The 2080S is 1815MHz. So this would be a higher % increase. So for your card, you said you got 2100MHz. That's just a 15.7% increase over stock boost clocks. With this card, it would be 23.8% increase.
I should note...this type of boost clock will NOT be sustainable on the RTX 3090 Founders Edition, or Reference Board design. Maximum power draw for those cards would be 375W (150W per 8-pin connector (x2 = 300W), and 75W from PCIE slot). So you have very limited room for overclocking. If what we see regarding the 2100MHz clock is correct, it would be from one of the 3x 8-pin connector card designs. The Asus STRIX, for example, is rated for a 400W TDP. And actually taking a look at my chart, the entire FTW3 line (FTW3, FTW3 Ultra, FTW3 Hybrid, FTW3 Waterblock) are 3x 8-pin connector designs. We know the Asus ROG Strix line has 22 power stages, and the EVGA Kingpin card has 23 power stages. This is while the Founders Edition and Reference Boards (any of the 2x 8-pin designs) have only 18 power stages. So the FTW3 line likely also has a 22-23 power stage design, which would allow these clocks.
Believe me...this kind of clock on a 3090 isn't easy, because the Nvidia reference/founders design has severely hampered its available power. While previous cards like the 1080ti or 2080ti had 20% more power available to them past the TDP, the 3090 only has 7.1% more power available to it. That means lower clocks, and even lower sustained clocks when the card heats up.
So again...if these clocks are on air, it will be absolutely amazing.
P.S. Don't take a shortcut to thinking, make a false dichotomy logical fallacy and assume any criticism of Nvidia automatically makes me an AMD fan.... As if there are only two choices in thought
Reference/Founders Edition cards can't surpass 375W. But I'm curious...how are you getting a lack of efficiency? The cards are literally performing that much more. At the same performance level as an RTX 2080, it wins out at 1.9x more performance per watt. So based on their graph, where the RTX 2080 got to 60fps with 240W, the RTX 3080 got to that same 60fps with 126W. Of course that's not entirely fair because you're comparing a card running at max clocks (least efficient) vs one with limited load (most efficient). But continue that curve and you end up with the 3080 hitting 105fps at 320W.
So in that example, compared to the 2080, it put out 75% more frames per second, at a 33% increase in power. Or in other words...an increase in performance per watt of 31%. Let me break it down a bit based on that chart:
RTX 2080 used 4 Watts for each frame per second (240W/60FPS)
RTX 3080 used 3 Watts for each frame per second (320W/105FPS)
RTX 3080, just matching the 60fps of the RTX 2080, used just 2.1 Watts for each frame per second (126/60)
So at the same level of performance, it uses 2.1W instead of 4W. At maxed out performance, it uses 3W instead of 4W. You're bringing up total power usage which has nothing to do with efficiency. If they had doubled the number of cores and vram, you'd be looking at a 700W TDP. That wouldn't make it inefficient.
I'll give you some more details here. This is made on an 8nm node from Samsung, which is just a refined 10nm. Samsung is also far worse than TSMC. The RTX 3080, which has a TDP of 320W, is a (approximately) 627mm2 die with 28 billion transistors. The Radeon VII is made on TSMC's superior 7nm node. It's also using HBM2 memory which uses less power than GDDR6x. The die on that card is just 331mm2, and has a total of 13.2 billion transistors. Yet it has a TDP of 300W. And performance? Even less than a 240W RTX 2080 on average across all games.
I'm also seriously doubting AMD's ability to compete against these cards, because for the first time in a long time, Nvidia actually put out a massive GPU die on a brand new node. Normally it likes to milk people. AMD wouldn't be able to outperform Nvidia directly, so it'd have to go the way of the chiplet. The problem there is...even if you do get a good solution similar to how it operates on their CPUs, the system which lets the chiplets perform as one giant chip uses up a lot of power too, as is the case on the CPU front right now.
Nvidia went from 13.6B Transistors to 28B (about 1.70x more when you account for disabled cores)
Nvidia went from 60fps to 105fps (about 1.75x performance)
Nvidia went from 240W to 320W (about 1.33x more power use)
So the card is not inefficient. It is more efficient. It just happens to be very performant. And in fact, it is more efficient and performant than an AMD card on a better/more efficient node, with more efficient memory. And it is ahead in that category by a mile.
Is Samsung worse than TSMC in practice though? From what I've read, Samsung 8LPP has more relaxed design rules than TSMC 7N. That might allow you to pack in more transistors than a denser (on paper) but more restrictive process. If your die size for the 3080 is correct, it has around 45 million transistors per mm². Navi meanwhile, only has 41 million transistors per mm².
Well, I know when Apple had Samsung make some of their chips along with TSMC, the Samsung ones had higher power usage, and throttled more, so they sold them in their cheaper iPhone versions. Situation might be different with high performance chips, but that seems unlikely. Samsung is a cheaper option than TSMC. So if it performed as well or better than TSMC, why would the Ampere A100 cards be made by TSMC?
Samsung’s fab has a lot of issues. Nvidia used them only because they made a gamble by signing with them instead of TSMC and then TSMC being booked solid. Samsung wasn’t chosen for quality or performance. It’s what they were unfortunately stuck with.
A100 chip has 54.2 billion transistors on an TSMC 7nm at 826mm2. Samsung has 28 billion transistors in 627mm2.
Those numbers alone say it all.
So just as a comparison:
Samsung 8nm (Ampere RTX 3090) = 44.6 million transistors per mm2
TSMC 7nm (Ampere A100 HPC) = 65.6 million transistors per mm2.
So TSMC would have 47% more transistors per mm2. Or in other words, instead of a 627mm2 die on Samsung 8nm, it could have been on a 426mm2 die instead. Now I know there’s not a linear power reduction from node shrinks at these levels but it would still have been quite significant.
The A100 has a massive amount of cache. As we know, memory cells are packed more tightly. The two chips aren't comparable.
And of course, transistors are no good when unused. Nvidia has to disable 15% of the GA100's cores to reach acceptable yield. Now we don't know how many 3080s Nvidia has to make before they get one 3090. The fact that the two cards are launched concurrently does hint at excellent yield.
Remember the 3090 isn’t a full fat die either. It has SMs disabled. I’d expect a similar situation as with the pascal Titan x and Titan xp. Eventual refresh with few percentage more cores and the full stack gddr6x bumping it up to 48GB@21Gbps.
As for the other part, you’d know more about cache transistor density than I would so I’ll yield to you on the specifics of the comparison. But with your knowledge, would you not say the TSMC 7nm appears to be significantly better than the Samsung 8nm even if not to the degree I stated?