Sign in with
Sign up | Sign in
External Graphics Over PCIe 3.0? Netstor's NA255A, Reviewed
By ,
1. Netstor TurboBox NA255A: Space For Up To Four GPUs, Externally

Although we're starting to see more mainstream-oriented applications optimized for OpenCL, allowing graphics processors to help speed up performance, general-purpose GPU-accelerated software is still most pointedly aimed at the server and workstation space. Much of that has to do with optimizations for CUDA, which only Nvidia's GPUs support. But OpenCL is gaining traction for video editing, compression, image manipulation, and even bitcoin mining.

When all-out compute power is your priority, connecting multiple GPUs is a great way to push more performance in those apps. Going so far as getting four graphics processors working together in CrossFire or SLI can really help boost the tasks able to utilize them. And we're not talking about gaming, either. We've seen enough examples of three-card scaling tapering off, as four cards don't improve frame rates at all. No, three- and four-way setups are often the domain of power users in need of massive floating-point math.

If you're in the distinguished group of folks able to use four dual-slot graphics cards cooperatively, then you face a handful of configuration issues to overcome. What motherboard do you use? Which case do you pick? Is there a power supply with enough auxiliary connectors for the cards you're using? And how on earth do you plan to keep all of that hardware cool? Even if you get everything set up just the way you want, don't expect to have room left over for any other PCI Express-based upgrade.

But what if you could externalize all of the hardware involved, and maintain a relatively quiet host workstation? What would it take to house all of that hardware? Well, you'd need a large-enough enclosure, able to accommodate eight slots worth of back-panel I/O. You'd need a motherboard with at least four PCI Express x16 slots for the graphics cards. Power delivery would be of the utmost importance, of course. And cooling would be a lynchpin, not only for keeping four stacked boards running stably, but also for keeping the configuration quiet enough to use in the same room.

Meet Netstor's solution, called the TurboBox NA255A. It looks a lot like a mid-tower PC, but its application is far more specific. Designed to function as a PCI Express 3.0-connected expansion box, the NA255A comes with its own 1,000 W power supply and cooling fans. Its sole purpose in life is to add a quartet of PCIe x16 slots (electrically wired to run at x8 transfer rates) and two x4 slots to your list of expansion options. The whole thing is fed by a single 16-lane expansion card that drops into your host workstation, attaching externally.

One third-gen PCI Express x16 slot is able to push up to 16 GB/s per direction. Although there is overhead involved, you still get a ton of bidirectional throughput. Put differently, you can get close to the same transfer rate from one 16-lane PCIe 3.0 slot as four of the eight-lane slots commonly used to create multi-GPU arrays on Sandy Bridge-based platforms.

In theory, that's enough to game on, though this enclosure isn't really designed for the gaming market. Rather, it's intended to serve the folks looking to sling several graphics cards together for massive general-purpose GPU computing. We're interested in both possible use cases, so we're running benchmarks that apply to both segments.

The enclosure itself measures 18" x 14" x 7". It's very solid, finished in brushed aluminum with grating in the front to encourage airflow. Netstor is going for that Apple aesthetic, it appears. And for what it's worth, the TurboBox is compatible with both PCs and Macs. Because modern Mac Pros only give you one second-gen 16-lane slot and two four-lane slots, with a combined output of 300 W, it's hardly a surprise that Netstor is looking to extend compatibility to that platform as well.

One card goes into your host machine and another goes into the TurboBox itself. Cables between the two interface boards facilitate the external connectivity.

But it's what's inside the TurboBox that matters most. You'll find a Surestar TC-1000PL 1,000 W power supply, two 120 mm hot-swappable fans, a 9.5" x 11.7" PCB (NP952A-GPU), and a 6" x 4" PCIe interface card (NP970A).

2. Setup And Overcoming Issues

In theory, populating the NA255A should be as easy as dropping in graphics cards, connecting their power leads, and hooking the external enclosure up to the host PC's PCI Express card. The TurboBox is designed to extend standardized interfaces, so no software driver should be necessary. In the real world, though, setup isn't quite that easy.

We encountered a couple of snags along the way. First, I initially didn't realize that the PCIe-based interface cards have specific I/Os. If you look closely, one port on each card is etched with a x16 and the other is etched with a x8. I accidentally hooked the x16 up to the x8 and vice versa. The mistake was easy to reverse, but it doesn't appear to be mentioned anywhere in Netstor's documentation.

The second hang-up was a little more worrisome. Mainly, I couldn't get the TurboBox working at PCI Express 3.0 signaling rates. First- and second-gen PCI Express worked fine. But when the jumped was set to PCIe 3.0, the enclosure stopped recognizing the graphics cards I was plugging in. Netstor helped us work through the issue, which involved reconfiguring switches on the interface cards. This solved our issue.

Our third issue wasn't the TurboBox's fault at all. During our first round of benchmarks, we saw odd performance drops with three Radeon HD 7970s installed. Much troubleshooting revealed that some of our Tahiti-based boards weren't working together the way they should have. It turned out that boards from different vendors shipped with incompatible firmware, which hampered multi-card configurations (even though this should have been fine). Mixing and matching products, even those from the same family, is asking for trouble. Fortunately, we worked around the problem with a different card combo.

Finally, we weren't able to test four Radeon HD 7970s at the same time. Again, this wasn't Netstor's fault, however. The TurboBox is absolutely able to accommodate a quartet of dual-slot boards. But because some of the 7970s in our lab are a little larger, they don't fit into the strict space limitations of two expansion slots. As a result, we're testing with three Radeon HD 7970s. It all works out, though: the ASRock X79 Extreme9 motherboard I'm using only has room for three 7970s anyway, so that's our hard limit for comparing native on-board connectivity to the performance of Netstor's device.

3. Test System And Benchmarks

Our test system is built around Intel's X79 Express chipset, with 8 GT/s transfer rates to each 16-lane PCI Express graphics slot. We're going to measure the performance of native connectivity and the TurboBox using one, two, and three GPUs in each solution. This should tell us whether there's any penalty for externalizing graphics, or for interfacing with the enclosure over a single third-gen PCI Express x16 slot. Part of our testing also involves comparisons between PCIe 2.0 and 3.0, quantifying the benefits of modern technology versus what came before.

As mentioned, we're using three Radeon HD 7970s cards for testing, all of which are set to AMD's reference core and memory clock rates.

A number of games should help us flesh out 3D performance, while LuxMark and GUIMiner stand in as OpenCL-accelerated benchmarks. Although we know that the TurboBox isn't a gaming-oriented product, a few tests at 1920x1080 and 5760x1080 should shed some light on its performance potential.

Test System
CPU
Intel Core i7-3960X (Sandy Bridge-E), 3.3 GHz @ 4.25 GHz , Six Cores, LGA 2011, 15 MB Shared L3 Cache, Hyper-Threading enabled.
Motherboard
ASRock X79 Extreme9 (LGA 2011) Chipset: Intel X79 Express
Networking
On-Board Gigabit LAN controller
Memory
Corsair Vengeance LP PC3-16000, 4 x 4 GB, 1600 MT/s, CL 8-8-8-24-2T
Graphics
3 x Radeon HD 7970
Hard Drive
Samsung 470-series 256 GB (SSD)
Power
ePower EP-1200E10-T2 1200 W
ATX12V, EPS12V
Software and Drivers
Operating System
Microsoft Windows 8
DirectX
DirectX 11.1
Graphics Drivers
Nvidia 310.70 beta
4. Results: General-Purpose GPU

Netstor's TurboBox NA255A is intended for multi-GPU workstations able to leverage the compute power of graphics hardware. So, we start our evaluation using LuxMark.

As you can see, there is no difference in performance between a motherboard operating a PCI Express 2.0 signaling, a motherboard at PCI Express 3.0, and the TurboBox. This workload fully utilizes each graphics card's compute resources, but it doesn't tax PCI Express bandwidth. Consequently, scaling is pretty much amazing.

Again, we see nearly identical results between the motherboard-based cards and Netstor's TurboBox. This means that we don't have any trouble going outside of the box or stepping down to PCI Express 2.0, at least in compute-bound workloads. 

On a related note, we ran the same bitcoin mining test on a Socket FM1 motherboard using a PCI Express slot limited to four lanes of connectivity. The result was identical to our 16-lane tests, around 550 Mhash/second. In other words, we're not worried about one third-gen PCIe x16 slot serving up enough throughput for a fourth card in Netstor's NA255A TurboBox. We hypothesize that there's still headroom available.

5. Results: Medal Of Honor Warfighter

With those two compute-oriented tests out of the way, let's have a look at entirely different type of use case: gaming. Again, this isn't what the TurboBox was intended for, but application's demands are going to tax the TurboBox in a different way; perhaps we'll be able to see where available throughput affects scaling, and how that compares to a native motherboard-based solution.

Interestingly, the fastest results are achieved by two graphics cards plugged into our X79-based motherboard. Adding a third card adversely affects performance, either due to a platform limitation or an unoptimized three-way CrossFire profile, both of which could be indicated by a lower minimum frame rates. The TurboBox performs well, but does fall behind the motherboard-attached cards.

With a single card installed, all three configurations perform identically.

Again, the TurboBox performs almost identically as the motherboard-installed cards. At this resolution, we see three cards slightly outperforming two, although they suffer lower minimum frame rates, too. In all cases, scaling is sub-par, to be sure.

6. Results: Crysis 2

We ran these numbers prior to the launch of Crysis 3. So, we take a look back its predecessor, Crysis 2.

At just 1920x1080, we're not using a high-enough resolution to make three Radeon HD 7970s sweat. Thus, a platform-oriented bottleneck keeps the dual- and triple-card configurations generating similar average frame rates. There's one outlier: two Radeon HD 7970s plugged into the TurboBox. Because three cards in the same enclosure perform as we'd expect, that could have been a configuration problem (we went over several of those on page two).

The dual-card issue rights itself at 5760x1080. However, the TurboBox doesn't scale as aggressively when we step it from two to three Radeon HD 7970s. Fortunately, we have a couple of other demanding apps coming up that can either contest or corroborate these findings.

7. Results: DiRT Showdown

A persistent platform bottleneck keeps the move from two to three Radeon HD 7970s flat, devoid of performance gain at 1920x1080.

With two cards installed on our X79-based motherboard, the minimum frame rate takes a hit. But the average is even with the other two configurations.

It's common sense, but if you're shopping for more than two high-end graphics cards, make sure you're gaming at 2560x1600 or 5760x1080. These numbers show us that scaling resumes once you start using a demanding-enough resolution.

There's really very little practical difference between our three tested configurations, suggesting that a single 16-lane slot going out to three Radeon HD 7970s is ample.

8. Results: Metro 2033

Metro's benchmark tool reports that scaling in this demanding title is identical across all three configurations.

The same holds true at 5760x1080. Netstor will be happy to know that its TurboBox might not have been designed as a gaming enclosure, but that it can stand in as one, if asked to.

9. Power And Heat

The following chart is a little difficult to follow, so I'll break it down like this: the blue bar represents our PC on its own, the red bar is the TurboBox on its own, and the black bar is our PC plus the TurboBox attached.

Naturally, in half of the configurations, there's a missing red bar. This is because the TurboBox isn't being used when we plug our graphics cards into ASRock's X79-based motherboard. In the other half of the charts, the PC bar is incredibly short. That is the result of applying a heavy bitcoin mining load to graphics cards in the TurboBox, which doesn't affect the PC all that much.

As you can see, the TurboBox/PC combo uses about 100 to 150 W more than the PC alone. There's some overhead to accommodate the extra hardware, which we'd expect.

Although using the TurboBox increases power consumption, the cards themselves run cooler as a result of the NA255A's optimized airflow. Moreover, the PC's other components don't end up being subjected to additional heat from a multi-GPU configuration on the motherboard.

10. Our Benchmarks Prove Its Efficacy, But At What Cost?

Real data speaks volumes. Before we have it, though, proper conclusions are impossible to formulate, even when the math suggests we're on the right track. Personally, Netstor's TurboBox NA255A turned into an example of that confounding predicament. Before I even started testing, I knew the device's single 16-lane PCI Express 3.0 interface should have given me a wide-enough pipe for multiple GPUs working in parallel without imposing a bottleneck.

But I just had to test a card in a PCI Express slot limited to four lanes of second-gen PCIe in order to validate the TurboBox's results. And the numbers speak for themselves, confirming that this unit successfully externalizes graphics cards for GPU-accelerated compute tasks (or games, though this thing is in no way economical for such a usage model). Using three Radeon HD 7970s, we weren't able to perceive any slow-down compared to cards dropped right onto an X79-based motherboard. Additional testing suggests a fourth card wouldn't have fared any worse, at least in bitcoin mining.

Now, how about this product's value? Netstor is asking about $2,200 for its NA255A. So, right off the bat, ouch. You could build a killer workstation including three Radeon HD 7970s for that much money. Granted, you'd still need to find the right case, the right power supply, a compatible motherboard, and then cool it all. But we're Tom's Hardware; that's what we do. For that reason, we find it hard to imagine where the TurboBox makes sense for a PC builder.

But what about someone working on a Mac Pro? Apple's more limited ecosystem means there is no such thing as a three- or four-way graphics array. This could be one of the only options for enabling multiple GPUs. If massive compute potential is important, you might need to swallow hard and consider Netstor's solution the cost of doing business in Apple's world.