$200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling — modded Tesla V100 SMX data center GPU runs AI LLMs and is more efficient than many modern midrange offerings in AI inference
Another hidden gem is about to be price-hiked now.
Running LLMs locally on your GPU requires a lot of VRAM, which can drive the rig's cost up exponentially these days. Amidst the ongoing AI boom, the best value lies in older, often forgotten silicon that's still capable, which is exactly what YouTuber Hardware Haven found. He took an Nvidia V100 server GPU with an SMX interface, which is similar to using a socketed processor, and converted it to a standard PCIe bus, which plugged into a consumer motherboard. It ended up performing quite well for its stature (and cost), even against modern SKUs.
The contraption begins with an Nvidia Tesla V100 AI GPU that uses the SMX2 socket and is designed for rack-scale deployments. The SMX interface is a mezzanine-based connector that mounts GPUs flat against a specialized baseboard, similar to a CPU socket, and the GPU is then screwed down to the baseboard. The host was able to acquire this GPU for just $100, and the accompanying SMX-to-PCIe x16 adapter was also around $100, bringing the total cost of the setup to $200. The V100 comes with either 16 or 32GB of HBM2 (we're working with 16GB here, sporting 900 GB/s of bandwidth), and it's based on the Turing architecture.
The PCIe adapter card didn't come with any cooling of its own, and since the V100 is literally just a heatsink on a PCB, the YouTuber designed and 3D-printed a duct for it. He attached an 80mm Notcua fan on the end to draw in fresh air toward the heatsink. The adapter also has 2x 8-pin PCIe power connectors for, well, power, along with 3x 4-pin PWM headers. It does not feature a secondary SMX socket for NVLink; however, such sockets are much more expensive.
Once the GPU was ready and slotted into a standard Ryzen system, it was time to test just how artificially intelligent a 2017 card is. Keep in mind that the V100 has no display output, so you need integrated graphics in your CPU to actually use your computer. In Ollama, using gpt-oss-20b, the V100 was able to crank out 130 tokens per second, while the Radeon RX 7800 XT in the YouTuber's daily driver system only achieved about 90 tokens per second.
Both cards have 16 GB of VRAM, and the RX 7800 XT is even newer with supposedly more efficient silicon, but then again, Nvidia is the gold standard for software support in these benchmarks. So, the host switched to an RTX 3060 12 GB (the best Nvidia GPU he had on hand) to compare against the V100, which is also built on newer Ampere architecture.
Running Google's gemma4: e4b, the V100 topped out at 108 tokens per second, while the 3060 12 GB only managed about 76 tokens per second, but it did so consuming less power — 293W on the V100 versus 235W on the RTX 3060. If we calculate tokens per watt, that comes out to around 0.37 for the V100, slightly more efficient than the 0.33 tokens per second per watt on the 3060.
Power-limiting the V100 to 100W (it comes with 300W out of the box) dropped the power draw to 170W in the same test, while still producing 95 tok/s. To make the comparison fair, the YouTuber also limited the 3060 to 100W; it ended up consuming 171W and producing just 68 tokens per second. So, with both new results, the V100 achieves an efficiency score of 0.55 tokens/s per watt, while the 3060 12 GB was stuck at 0.39 tokens/s per watt.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Even though the V100 proved much more efficient overall, despite being several generations old, its idle power draw is the real crux. It sips 45W just sitting doing nothing, compared to 35W on the RTX 3060. Finally, the YouTuber also tested Frigate NVR, which ended up performing really well on the V100, better than the RTX 3060, but consumed more power, as you'd expect.
The host's previous setup for Frigate was an Intel-based N100 mini PC that struggled to ever detect his dog on mobilenetv2, but the V100 was able to identify it instantly. Monitoring just two cameras made the V100 pull over 100W, though; the RTX 3060 was similar in this regard, while the older N100 consumed only 26W when operating six different cameras. That marks the end of the benchmarking.
This V100 experiment turned out to be a success overall, but the virality of the original video and the fact that we're writing this article mean these bad boys are about to go up in price. So, if you're interested, make sure to grab one before it's too late; the YouTuber found it for just $100 on eBay, and the PCIe adapters for early SMX sockets are cheap enough as well. The 32GB variant of the V100 goes for $500, however.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.
-
bit_user Reply
"Modern"? The RTX 3060 launched more than 5 years ago. The RDNA 3 cards launched in 2022, although the RX 7800 XT got delayed for a year, while AMD waited for inventories of RDNA 2 cards to draw down.The article said:modded Tesla V100 SMX data center GPU runs AI LLMs and is more efficient than many modern midrange offerings in AI inference
The real value of V100 is actually in their fp64 performance, which you still cannot match with consumer hardware. IMO, it's a "waste" to use them for AI, which they're not particularly good at and can easily be surpassed by something just a little newer or higher-end than what the subject tried.
BTW, the V100's driver is now in legacy support mode, which means that it won't support newer versions of CUDA that newer versions of popular deep learning frameworks will eventually start requiring. Anyone buying a V100 should go in with eyes open that your window for using one with up-to-date software is somewhat limited. -
Fox2Fox2 ReplyAdmin said:Turns out, Nvidia's older Turing-era V100 AI GPU is still pretty capable today, even with just 16GB of VRAM. A YouTuber got his hands on the SMX variant for just $100, converted it to a PCIe x16 interface for another $100 with an adapter, and got some pretty impressive results across AI inference and NVR benchmarks.
$200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling — modded Tesla V100 SMX data center GPU ru... : Read more
Minor correction to both you and article, it's an SXM2 Mezzanine connector. SXM stands for Server PCi Express Module, 2nd generation. Usually these plug in and sit on the main board (or a carrier) almost like a CPU or a RAM stick.
The actual Volta is pretty old at this point. It launched, iirc, right around the eve of the Transformer paper (Attention is all you need), like 2017/2018. It quickly became apparent that the market needed GPGPUs with significantly more VRAM than 32GB max per card and better matmul capabilities, causing Nvidia to rush to market multiple options in the coming years.
So you had significant buy-in for these cards from data centers only to replace them as soon as they bought them. That flooded the market with these and cratered the price to sub 1000 for the 32GB SXM2 model.
In the following few years you saw Ampere get rushed out, iirc the actual release of the A series was fairly staggered and there was an intergenerational increase in VRAM until the hopper replaced them.
Last bit, another technical nitpick. Turing refers to the gaming and professional cards that came out in 2018. Volta is technically it's own thing. -
gaspoweredcat This isn't news at all, the v100 has always been there and seemed good value as the hbm made it crazy fast, I spent ages experimenting with the cmp 100-210 which is the mining version of the v100, I got a load at like £90 per card, fat with one card but start increasing and the cracks show.Reply
Not only that but trying to use vllm or sglang is going to land you untold headaches as they lean fairly heavily on ampere based stuff
It may be useful for some but it's not a game changer for local LLM I'm afraid -
bit_user Reply
What do you mean by this?gaspoweredcat said:fat with one card but start increasing and the cracks show. -
gaspoweredcat typo, was supposed to be "fast" and the meaning is teat with the CMPs (due to their restricted pcie bandwidth) and pretty quickly with the v100s youll hit the ceiling of the transfer speed between cards and it gets SLOW, 1x cmp100-210 is crazy fast, 2 is sort of accepable-ish, 3 and things start crawling, realistically for larger multi card setups you need nvlink etc really, its passable with the v100s granted you dont mind being stuck on llama.cpp and having no ampere features like FAv2 etcReply -
bit_user Reply
Cool. Thanks!gaspoweredcat said:typo, was supposed to be "fast" and the meaning is teat with the CMPs (due to their restricted pcie bandwidth) and pretty quickly with the v100s youll hit the ceiling of the transfer speed between cards and it gets SLOW, 1x cmp100-210 is crazy fast, 2 is sort of accepable-ish, 3 and things start crawling, realistically for larger multi card setups you need nvlink etc really, its passable with the v100s granted you dont mind being stuck on llama.cpp and having no ampere features like FAv2 etc
Did you try any fp64 stuff? -
bit_user BTW, anyone who wants a video output and integrated fan can get a GV100. That's the workstation version, which currently seems to run about $1200 and up, for 32 GB. When they were new, I think they cost about $10k.Reply
And, there's the Titan V, which has only 12 GB and just 3/4ths of the V100's memory bandwidth, but at least the compute units on the chip aren't nerfed.