HighPoint's adapter enables GPUDirect storage — up to 64 GB/s from drive to GPU, bypassing the CPU
Meet the Rocket 7638D switch card

HighPoint on Thursday introduced its Rocket 7638D PCIe 5.0 switch card that is designed to enable Nvidia's GPUDirect interconnection between AI GPUs and NVMe storage devices. The device is designed to speed up AI training and inference workloads when operating with software that fully support GPUDirect.
The latest GPUs from Nvidia (starting from A100) support GPUDirect technologies that enable direct data transfers between GPUs and other devices, such as SSD or network interfaces, bypassing the CPU and system memory to increase performance and free CPU resources for other workloads. However, GPUDirect requires support both from the GPU and from a PCIe switch that supports P2P DMA capability, but not all PCIe Gen5 switches support this feature, which is where switch cards come into play.
HighPoint's Rocket 7638D switch card packs the Broadcom PEX 89048 switch enabling system integrators to build systems with GPUDirect capability. The adapter features 48 PCIe 5.0 lanes: 16 lanes to connect to host, 16 lanes to connect to external GPU box using a CDFP CopprLink connector, and 16 lanes are dedicated to internal NVMe storage devices using MCIO 8i connectors. The MCIO ports support up to 16 NVMe drives, enabling configurations with up to 2PB of high-performance storage.
The Rocker 7638D adapter enables GPUDirect Storage workflows that avoid host CPU and RAM entirely and provide predictable bandwidth (up to 64 GB/s) and latency when paired with compatible software, which includes operating system, GPU drivers, and filesystem. The device (or rather systems that it enables) is particularly useful in scenarios involving large-scale training datasets that use plenty of storage.
Since the the Broadcom PEX 89048 switch chip contains an Arm-based CPU, it is completely independent and self managed, so it is compatible with both Arm and x86 platforms. The adapter works out of the box with all major operating systems, without the need for special drivers or additional software installation.
The Rocket 7638D includes field-service features such as VPD tracking for hardware and firmware matching and a utility that monitors health status and PCIe link performance. These tools simplify troubleshooting and replacement, especially in multi-node or hyperscale installations where hardware tracking matters.
HighPoint did not disclose pricing of its Rocket 7638D switch card, which will probably depend on volumes and other factors.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
abufrejoval To my knowledge any GPU can bypass the CPU for storage or network data transfers, if it talks PCI and asks the OS nicely for setting up address mappings between both sides: that administrative part requires some CPU/OS support, but then they can just fire away, while arbitration will protect all parties from monopolising all bandwidth.Reply
And again, for all I know, CUDA supports these functionalities pretty much since it became popular for HPC, because nobody there wants to bother with CPU overheads while other GPU software stacks might (or should) for the same reason.
Reaching back into the older crevasses of my mind, I believe the IBM XGA adapter should have been able to do the same, since it supported bus master operations. And once GPU and network/storage data are both memory mapped, it only takes software to make things happen.
And that software support, which is really just about delegating some of the (CPU based) OS authority over the PCI(e) bus to a GPU (or any other xPU device, that might want it), isn't naturally hardware dependent (the article mentions A100), but a matter of driver and OS support to negotiate and set up that delegation: AFAIK this is rather older, but might have mostly supported network/fabric support for HPC workloads, targeting MPI transfers over Infiniband; local storage wasn't popular in HPC for a long time, because it was typically more bother than help until NV-DIMMs or really fast flash storage came along.
In short, this isn't a HighPoint feature or a result of their work, even if HighPoint might want to create that impression. These are just basic PCIe, OS and GPU/CUDA facilities that HighPoint supports just like any other PCIe storage would. It's like a window vendor claiming to also support "extra clean air".