The Core i9-11900K may not be one of the best CPUs anymore, but that doesn't mean the flagship Rocket Lake chip has lost its mojo. Our resident extreme overclocking guru, Allen 'Splave' Golibersuch, has set a new world record with the Core i9-11900K in PyPrime 32B, outperforming the previous record holder, the Core i9-14900KS.
Setting world records isn't just about having the fastest hardware at hand. It helps when chipmakers like AMD or Intel send you trays and trays of processors to find the best sample. However, a lot of preparation work goes into an extreme overclocking endeavor, and finding the best platform for the job is one of them. Despite Intel and most of the hardware world having moved on from Rocket Lake, Splave has discovered that the Intel 500-series platform is probably one of the last low-latency platforms of its kind.
For those who haven't heard of PYPrime, it's an open-source RAM benchmark based on Python that scales with processor and memory overclocking. However, the latter is more important, so Splave beat the previous record set by Korean overclocker safedisk despite the former having an older chip and running at a lower clock speed. Splave's Core i9-11900K was operating at 6,957.82 MHz compared to safedisk's Core i9-14900KS, which was overclocked to 8,374.91 MHz.
With Rocket Lake, Intel introduced gear ratios, a similar approach AMD had taken with its Ryzen processors. As a quick recap, in Gear 1, the processor's memory controller and the memory speed are in sync. Meanwhile, Gear 2 forces the processor's memory controller to run at half the memory speed, so there's a performance hit. Rocket Lake runs DDR4 in Gear 1 and DDR5 in Gear 2 by default. Officially, Rocket Lake can support up to DDR4-3200 in Gear 1, so Splave's Core i9-11900K is a remarkable sample since it can do DDR4-3913.
Unlike other memory benchmarks that favor bandwidth, PYPrime just loves latency. Splave was running DDR4-3913 dual-rank, double-sided memory for interleaving gains. Splave tuned the timings to 12-11-11-18 1T, which are very tight thanks to the Samsung B-die ICs inside the G.Skill Trident Z DDR4-3466 C16 memory kit. For comparison, safedisk was using DDR5-9305 with timings configured to 32-47-42-34 2T. If we do the math, Splave's memory latency is 6.133 ns, up to 11% lower than safedisk's 6.878 ns.
For extra performance gains, Splave set the affinity to the core closest to the processor's integrated memory controller (IMC) and chose Windows 7, which is lightweight and not hindered by mitigations and security patches that newer operating systems and processors require to be safe. The result is Splave dethroning safedisk in PyPrime 32B by 285 ms. It may not seem like a considerable margin, but every last millisecond counts in competitive overclocking.
It's always thrilling to see old hardware beating the latest and greatest. Splave's PyPrime 32B world record pays homage to Rocket Lake and, more importantly, ASRock's Z590 OC Formula, the brand's last unrestrained overclocking motherboard. And while ASRock has tested the waters with the Aqua OC, it's not the same. Fear not overclocking enthusiasts; a little bird has whispered to us that the Formula is returning for Z890 and Core Ultra 200 (Arrow Lake) series that will hit the retail market before the end of the year.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Zhiye Liu is a news editor and memory reviewer at Tom’s Hardware. Although he loves everything that’s hardware, he has a soft spot for CPUs, GPUs, and RAM.
-
bit_user Sadly, latency is a tradeoff that DDR5 had to make, in order to continue scaling density and bandwidth. If you compare the lowest latency DDR5 (in terms of ns, not CL), it's nearly as good as fast DDR4 memory. At least, that's what I saw in examples I looked at.Reply
Remember kids: you need to divide CL by the memory speed, in order to compare them.
That's somewhat redundant. The key detail is "dual-rank". It just so happens that most dual-rank DIMMs are also double-sided and vice versa. However, I think it should be possible to have chip-stacked DRAM that's dual-rank without being double-sided.The article said:Splave was running DDR4-3913 dual-rank, double-sided memory for interleaving gains -
jkflipflop98 Back in Ye Olde days when the company would just hand us free processors on request, I used to be into this scene. I'd go down to the SEM labs and fill a dewer with a bunch of LN2 and head home for nerdy fun on the weekends.Reply
Gets expensive when you have to buy your own CPUs, though. -
btmedic04 Rocket lake never had DDR5 support, so that whole part about gear 2 for ddr5 on rocket lake is erroneous. Do betterReply -
TheHerald
It's also the size of the cache, as it grows latency grows with it. I never tried Rocket lake but going from CFL to CFL+ to CML (8700 --> 9900 --> 10900) latency progressively increased, a 8700k with tuned 4000mhz ddr4 can drop to 32-33ns, with a 10900k even at 4400mhzc16 ram youd hit a wall at around 36ns. You can get lower than that but with unsavory amounts of voltage.bit_user said:Sadly, latency is a tradeoff that DDR5 had to make, in order to continue scaling density and bandwidth. If you compare the lowest latency DDR5 (in terms of ns, not CL), it's nearly as good as fast DDR4 memory. At least, that's what I saw in examples I looked at.
Remember kids: you need to divide CL by the memory speed, in order to compare them.
That's somewhat redundant. The key detail is "dual-rank". It just so happens that most dual-rank DIMMs are also double-sided and vice versa. However, I think it should be possible to have chip-stacked DRAM that's dual-rank without being double-sided. -
bit_user
Huh? All of those CPUs used the formula 32/256/2048 kB for L1d/L2/L3 cache per core. The only thing that changed in each case was the number of cores. That meant more stops on the ring bus, but you also got more L3 cache, as a side effect.TheHerald said:It's also the size of the cache, as it grows latency grows with it. I never tried Rocket lake but going from CFL to CFL+ to CML (8700 --> 9900 --> 10900) latency progressively increased,
Another thing about them is they all used basically the same process node. If you're using a denser process node, then you can sometimes increase cache size without increasing the latency (in clock cycles). The other thing you can do to reduce the effect of cache on memory latency is simply clock higher while maintaining cache latency as the same number of clock cycles, thereby reducing how many ns cache tag RAM lookups have on memory latency.