AMD Threadripper 3990X: How I set 10 Overclocking Records

(Image credit: Tom's Hardware)

AMD's 64-core 128-thread Threadripper 3990X represents the pinnacle of threaded horsepower, but if you've read my Threadripper 3970X article, you can really skip this one because AMD has simply doubled the performance: The 3970X multiplied by two equals the Threadripper 3990X, an absolute corker of a processor. Eight 8-core chiplets frosted in solder tucked under a rectangular copper blanket, idling quietly for their chance to pounce on any threaded load within grasp. Are 128 threads of red power necessary for everyone? You decide.

While you start brainstorming ways to get a home equity loan, or which body parts you don't need, I’ll show you how I broke some world records and made it all the way to 5.749 GHz from its stock boost speed of 4.3 GHz.

(Image credit: Tom's Hardware)

Let’s get down to brass tacks. All the CPU frequency and threads in the world will not get you anywhere without an operating system that plays along nicely. I only use Windows operating systems because my goal is validated world records, and we’re limited to using Windows to do so. 

I've spent weeks sorting different versions of Windows 10, Server, Enterprise, Pro, Build 19035, 1909, 1809, 911….all in search of the best operating system to handle the titanic power that is the 3990X. Different benchmarks like different things, but Server 2019 1809 and 2012 delivered such unparalleled performance that I beat liquid nitrogen (LN2) cooling scores with simple water cooling. To give you a glimpse of the process, picture me with bad posture, chin in my hand testing countless different installs, registry settings, available memory, and prefetcher values; this is a far cry from the glamorous smoke-laden RGB light beam pictures you see in my articles. This is where “anyone could do this” talk comes to die. It’s a grind, but at the end of the day, the records make it all worth it.

(Image credit: Tom's Hardware)

As you can see, all the testing has been worth it. The world record for the HWBot 4K x265 benchmark is in range and should be easy to hit on LN2. In this benchmark, an overclocked 3990X on chilled water is almost as fast as $15,000 worth of AMD’s Epyc CPUs that have twice the threads and cores.

(Image credit: Tom's Hardware)

For extreme overclocking, fabric temperatures are really the limiting factor for the Threadripper chips. They really hate to post (complete the boot-up process) at cold temperatures. 

For the x265 4K benchmark record, I wanted to see how far the CPU could go with an extreme load with water cooling at 0C. Even at 0C with the fabric yolked out at 1867MHz, I had issues posting because of the temps being too low. I had to turn off the pump to post and enter the operating system. That prevents the waterblock from receiving any flow, which keeps the chip warmer. Once in the OS, the CPU responds to the chilly temps just as you would expect: It loves it. Running the chip at 4.5GHz on all 64 cores at a 1.4 vCore pulled a peak of 1280 watts at 0C.

When you’re on simple ambient cooling, the fabric can be 1867MHz all day without a single issue. 

(Image credit: Tom's Hardware)

One last boring factoid about testing before we play with liquid nitrogen: These processors are very happy with simultaneous multi-threading (SMT) off. We can clock higher, use less power, and the benchmark programs are happier in general without the need for “two-node” compatibility. Wprime32, Wprime1024, and Cinebench R11.5 all run about 30% faster with SMT off, and clocks are well into the 4.5GHz range on normal custom water cooling.

It’s also interesting to compare the 3970X with SMT on to the 3990X with SMT off, 64 threads on each. On Wprime1024, with both processors at 4.5GHz and 64 threads active, the 3970X clocks in at 17.5 seconds while the 3990X takes only 12.5 seconds. This goes to show you that thread count is not the only thing to consider, and physical cores are unquestionably much faster.

This leads to some other information I haven't touched on, but memory overclocking on the 3990X is stunted compared to the 3970X. More dies is apparently more difficult. The three or four 3970X processors I have tested could all easily run 4600-4800 MHz memory clocks at tight latencies of 14-14-14-14-1T, but while I tried everything in my power to make this 3990X run over 4400 MHz, I failed. Is this a problem? Not really, because most of the time, we try to keep our memory frequency, fabric clock, and Uclock in line at 1:1:1, but I think it is noteworthy nonetheless.

I would put overclocking potential on water cooling for an average 3990X in the comfortable range of 4.3-4.4 GHz depending on your luck in the silicon lottery. High-end air cooling really tops out around 4.00-4.1 GHz. If you spend $4k on this chip, spend a little more and cool it properly.

(Image credit: Tom's Hardware)

LN2 Rig:

CPU: AMD 3990X Threadripper

Motherboard: ASRock TRX40 Taichi

Memory: G.Skill NEO 4x8GB NEO 3800C14

Power Supply: 2X Enermax Maxtytan 1250W linked with 24pin jumper

CPU Container: 8ECC TR Ln2 Pot

Thermal Paste: Thermal Grizzly Kryonaut 

Rubber Coating: Plasti-Dip Yellow (insulation against moisture)

(Image credit: Tom's Hardware)

I used the ASRock TRX40 Taichi for overclocking, which is the same motherboard that I used for the 3970X. ASRock designed the board with the 64-core Threadripper in mind, so it features 16x 90 Amp phases. How much power can the VRM handle? All of it. That brick of a VRM cooler is no joke. 

In fact, I used two power supplies to feed the CPU all it can eat, so I have a pair of single-rail Enermax Maxtytan 1250W PSUs that push out 104A on the +12v rail. I linked them together with a pretty basic dual PSU adapter that just signals power on to the secondary supply. 

Things get interesting with the 8-pin adapters, though. These are 8-pin CPU + 6-pin PCIe that come together to a single 8-pin CPU connector. The best load balancing tactic is to take a 6-pin PCIe cable from PSU one and the CPU 8-pin cable from PSU two and connect them to motherboard 8-pin connector number one. 

Then I do the opposite for motherboard 8-pin connector number two. Because the motherboard isn't designed to run two PSUs, the load is not a 50/50% split, but after some testing of the various combinations, this setup got me the closest at around a 35/65% power distribution between the two. Knowing full well that one of the supplies alone can peak around 1500W without a problem with the 3970X, we should have plenty of breathing room with two of them, regardless of the imbalance. 

(Image credit: Tom's Hardware)

GSkill designed the Neo line of memory specifically for AMD systems, so it supports tight latencies and lower frequency with stellar results. I can do either 3200 MHz at 11-11-11-11-1t timings or 4400 MHz at 14-14-14-14-1T timings, with either setting at very low voltages for B-die memory kits. I have gotten plenty of binned B-die memory samples before, but this kit has handily beaten any previous kit.

Geekbench3 was my first target for liquid nitrogen benching. It is a real painful bench on 64 cores and takes nearly two minutes to finish. Of course, the hardest sub-test (Ray Trace) is one of the last at around the 1:45 mark. The pass/fail rate on the bleeding edge is around a 10% pass for the test. Can you imagine balancing a 1800W load at -160C for nearly two minutes? That uses about 1.5 liters of LN2 that cost roughly $3 per run. Is that not crazy?

(Image credit: Tom's Hardware)

After taking the world record in Geekbench3, I swapped over to Cinebench R20, which is peanuts by comparison. The test is over in roughly five seconds, and while the load is pretty heavy, it isn’t overwhelming. The score fluctuates quite a bit, which is normal for such a setup.

After a few minutes of running benches, the record is done. At this point, it's time to be done for the day. The memory in the inside slots is frozen solid, and everything is working fine, but I have found it better to quit when everything is still alive.

(Image credit: Tom's Hardware)

On the second day, I tackled the benchmarks that I had previously found to like SMT off. Wprime 1024m was the first to go. I set the clocks same as for Cinebench R20, which should easily pass with half the threads enabled. To my surprise, it ran all the way up to 5.4GHz with the fabric at 1867MHz. 

Fabric plays a monumental role in the 3990X’s performance, but it is the bane of my overclocking existence. We need it to run as fast as possible, but, as I've stated countless times, it really hates being cold. The processor hits what I want to call a performance cap. This was very apparent in Cinebench R20 when I fought with one of AMD’s engineers for the world record. When you use a higher fabric frequency, you will hit a lower CPU frequency. It's that easy.

(Image credit: Tom's Hardware)

In a perfect world, you want the highest fabric frequency possible (1867MHz) for this chip. At 1867MHz, 5.1GHz is the max stable core frequency and yields a score of 39,000 in Cinebench R20. If you lower the fabric to 1600MHz, 5.27GHz will pass, and you’ll still score 39,000 in the same benchmark. If you split the difference and run a 1733MHz fabric, 5.2GHz will pass, but you’ll still score 39,000 in R20. This is why I call it a performance cap: You can run any combination of fabric and core speeds at their limits, but you’ll achieve the same score.

You can totally avoid this performance cap by disabling SMT. You can run a 1867MHz fabric and max out the CPU at the same time, as seen in the wPrime world record at 5.4GHz. Lowering the fabric speed is not necessary. Why is this? I’m not sure, but now that we factually know it happens, hopefully someone can look into it.

(Image credit: Tom's Hardware)

TLDR

  • World Record Wprime1024 
  • World Record GPUPI 1B
  • World Record Geekbench 3
  • World Record Cinebench R20
  • World Record X265 1080p
  • World Record Cinebench R15
  • World Record Cinebench R11.5
  • 32x Core Record X265 4K
  • 32x Core Record Wprime32
  • 3990X Frequency Record: 5749mhz

On the power front, at 1.65 vCore, depending on the benchmark, the system drew between 1200 and 2000W. While just idling in the OS, the system used 375W, which is nearly the entire power budget of a Core i9-9900K under load. As a result, you have to pour liquid nitrogen constantly on the 3990X rig from the moment you turn it on until the moment you power it down. 

(Image credit: Tom's Hardware)

All in all, is the world ready for this CPU? Outside of the professional market, this is more of a showpiece of AMD's capabilities. Why do I feel this way? Well, no one I know is willing to purchase one for $4,000, for starters. And Windows just doesn't feel ready for 128 threads on a daily-use computer. 

Don't get me wrong: Just look at the scores. We've never seen anything that is even in the realm of what these chips can do. This is a supercar, not a daily grocery-getter, and I'm quite okay with that. It also does a great job of making the Threadripper 3970X look like a bargain at $1999: You can tell yourself how frugal you were by only splurging for the 32-core. 

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

Allen 'Splave' Golibersuch

 A world-champion competitive overclocker who frequently tops the charts at HWBot, a site which tracks speed records, Allen will do just about anything to push a CPU to its limits. He shares his insights into the latest processors with Tom’s Hardware readers from a hardcore, push-it-to-the-limit overclocker’s perspective. 

  • SampsonJackson
    Awesome article!
    Reply
  • drtweak
    " Build 19035, 1909, 1809, 911"

    Never heard of build 19035 or 911 XD
    Reply
  • slash3
    Great write-up. I used to play with cascade phase change cooling during the P3/P4 and Core 1/2 days, and while I never personally took the dive into LN2 I still keep up with the XOC community. It's an entirely different game from standard air or water based overclocking.

    Rather than trying to minimize thermals you begin to run into actual hardware limitations, as well as material property challenges. From power to humidity, cold bugs, broken multipliers, frequency holes and dozens of other new and interesting problems.

    Multi core processors like this are less of a drag race and more of a tractor pull. I love it.

    It really does become half art, half science. But it's always wholly fun to follow.
    Reply
  • Newtonius
    Pure legend right here.

    Reply
  • derekullo
    slash3 said:
    Great write-up. I used to play with cascade phase change cooling during the P3/P4 and Core 1/2 days, and while I never personally took the dive into LN2 I still keep up with the XOC community. It's an entirely different game from standard air or water based overclocking.

    Rather than trying to minimize thermals you begin to run into actual hardware limitations, as well as material property challenges. From power to humidity, cold bugs, broken multipliers, frequency holes and dozens of other new and interesting problems.

    Multi core processors like this are less of a drag race and more of a tractor pull. I love it.

    It really does become half art, half science. But it's always wholly fun to follow.
    I guess this picture sums it up.

    Reply
  • slash3
    derekullo said:
    I guess this picture sums it up.


    HP: 50
    Torque: Yes. All of it.
    Reply
  • schmuckley
    Bah! Go shovel some snow!
    Reply
  • schmuckley
    The server tip is Golden. I bet no other bencher besides me has read this.
    Curse you Allen! Must share ..wait..you're on the team..be sure and tell them about the server OS thing.
    You know where..Please?
    Say hi to Trouff for me. LOL!
    Reply
  • bit_user
    derekullo said:
    I guess this picture sums it up.

    Uh, that's not a tractor pull.
    Reply
  • bit_user
    Thanks for the writeup. I'll admit I didn't read the whole thing, but I didn't find any mention of lapping the CPU, in spite of the top photo showing what appears to be a partially-lapped CPU. Did you do it by hand, or use a grinding wheel?
    Reply