Extreme VRM Cooling

Here we can see the VRM cooling solution that Intel uses for its HEDT reference validation platform (RVP). After the issues we found with VRM cooling on Skylake-X motherboards, this certainly struck a chord.

Intel also ran into issues with VRM cooling while overclocking on its HEDT RVP (though the company didn't specify the exact generation of the chips under test), so the lab worked with its thermal engineering group to design a new heatsink (for internal use only) to address the issue. The lab pitched the project to its thermomechanical team, which includes a half-dozen PHDs, who then attacked the design with gusto. We're told that an amazing amount of design work and simulation went into the final heatsink design shown above. Unfortunately, there are only a few of these "insanely-overbuilt" heatsinks, and they are designed specifically for Intel's RVP boards.

On the topic of VRM cooling, Intel defines specs for chip power delivery but doesn't have a specific cooling recommendation for those subsystems. Instead, it's up to the motherboard vendors to assure that the cooling solutions meet the ratings of the various VRM componentry so it operates fully, particularly under high load during overclocking. While Intel doesn't set requirements for VRM cooling, they do test that as part of their normal flow with retail motherboards and give advice and feedback to the vendors.

How to Swap Your PCH

Intel's RVP boards have an interesting feature: sockets for platform controller hubs (PCH). For motherboards purchased at retail, these chips are soldered onto the board to provide the necessary I/O functions, but Intel's lab team has to test and validate multiple generations and new steppings of the various PCH chips during development. This socket allows for fast and flexible swapping of new PCH revisions.

These elastomer test sockets are fairly simple. The techs assemble the black retention mechanism around the bare BGA mounts on the motherboard, then drop in an interposer. These interposers are only rated for 15 insertions, though we're told they typically last much longer. The tech then drops the BGA PCH chip/substrate into the socket and tightens down the retention mechanism, which assures proper mating with the interposer and the underlying BGA pads on the motherboard. The housing provides enough thermal dissipation for the PCH, but Intel can also attach other cooling solutions to the top of the mounting mechanism if needed.

Our demos of Intel's internal testing tools were very informative, but the company is very guarded with details of some of its test tools, like the ITP-XDP box above, so we can't share screenshots, or even descriptions, of some of the interfaces. This is definitely among the most secret of Intel's tech in the lab, so there was quite a bit of trepidation from the lab team about exactly what they could show us, and what we could show you. After several clarifying conversations between the lab crew and the PR team assigned to our tour, and some negotiation on our part, the team allowed us to get at least a broad outline of this box and its capabilities.

Intel's RVP platforms have an XDP socket (pictured above) that allows the company to have unprecedented insight and control of its chips in real-time.

The company also provides ITP-XDP units to its ecosystem partners for their own testing and debug use, and motherboard vendors also add XDP ports to their test boards. Intel also has scripts that it allows the motherboard vendors to run on their hardware during its overclocking workshops. However, the Intel-proprietary box has multiple layers of security that assure a tiered access level, with only Intel having full access to the features. The final layer of security is so strictly controlled that Intel's OC lab technicians have to log each and every use of the unrestricted feature layer to a central database.

At the unrestricted access level, in the lab's own words, the ITP-XDP enables a connection to the chip that is "like having a direct connection to your brain." The ITP-XDP connects to a host system, which is then connected to the target (the system being observed/tested) and allows Intel to monitor and change internal parameters, MSRs, and literally every configurable option inside of a processor, in real-time. It doesn't just monitor the CPU, either: the interface also monitors every component connected to the chip.

This tool allows the team to identify overclocking bottlenecks and issues, and then change settings on the fly to circumvent those limitations. The lab then relays that information back to other relevant teams inside of Intel to optimize the processor design for overclocking.

The real-time changes, paired with exclusive hooks, unlock possibilities that Intel will never expose to normal users. For instance, theoretically, you could change cache timings and internal fabric settings in real-time after the operating system is running, among many other possibilities. This allows the chip to operate in ways that wouldn't make it past boot up. The lab engineers can overclock and test all the different parameters of the chip in ways that we won't ever have access to, which is probably part of the reason why they techs aren't allowed to submit HWBot world records. We're told world records fall easily with the capabilities enabled by the system.

Just for kicks, we requested a sample unit. The odds of that request being approved are somewhere south of zero.

Intel's overclocking team played an integral part in the development of the publically-available XTU software, and also works on the ongoing updates. The team designed the software to be useful, and they eat their own dog food. The team often uses the tool to test overclocking with settings that normal customers have access to.

Intel also has other restricted-use software utilities, like the power/thermal utility (PTU) that we've tested with a few times (as we did here), and the thermal analysis tool (TAT), which is used to monitor and diagnose conditions that impact boost activity. The latter utility is incredibly useful for diagnosing problems associated with boosting activity, and because overclocking really boils down to running in a heightened boost state, it also proves useful for debugging OC issues, like which internal power limits are restricting higher clock speeds. Intel uses these utilities heavily, but also provides them to motherboard vendors for qualification work.

