Hawaii Goes Professional
For the first time since 2007, AMD has a FirePro-branded card based on a really big GPU. At 6.2 billion transistors, the Hawaii processor boasts 44-percent more logic than the FirePro W9000’s Tahiti chip. They're both manufactured at 28 nm, though.
How else are the previous flagship and AMD's more recent introduction similar? Glad you asked!
First, let's compare the technical specs of two Nvidia Quadro cards to AMD's FirePro W9100. Hopefully that'll give us some basis for a performance expectation. While the AMD flagship's $4000 suggested retail price is higher than the Quadro K5000, it's still shy of the Quadro K6000. So, we'll put the FirePro in the middle of the following chart.
|Header Cell - Column 0||Nvidia Quadro K6000||AMD FirePro W9100||Nvidia Quadro K5000|
|Shaders||2880 CUDA cores||2816 Stream processors||1536 CUDA cores|
|FP32 Performance (SP)||5.2 TFLOPS||5.24 TFLOPS||2.2 TFLOPS|
|FP64 Performance (DP)||1.73 TFLOPS||2.62 TFLOPS||0.09 TFLOPS|
|Memory Size||12 GB||16 GB||4 GB|
|Memory Bandwidth||288 GB/s||320 GB/s||173 GB/s|
|PCI Express Bandwidth||32 GB/s||32 GB/s||16 GB/s|
|4K2K Displays @ 30 Hz||2||6||2|
|4K2K Displays @ 60 Hz||2||3||2|
|Power Consumption (measured)||187 W (3D load)202 W (GPGPU)||245 W (3D load)260 W (GPGPU)||126 W (3D load)145 W (GPGPU)|
When you have performance to offer, new opportunities present themselves. AMD identifies CAD and engineering, media and entertainment, medicine, and finance as some of its more traditional strong points. But with its big Hawaii GPU and the GCN architecture's alacrity in compute-intensive tasks, the company wants to lock down its share of the virtualization, cloud gaming, and signage segments as well.
The ambition makes sense. Workstation-oriented apps benefit more and more from the performance of modern GPUs, after all. Nowadays you can even run multiple CAD and CAE workflows at the same time. Cranking along on the next version of a drawing while rendering the previous one isn't a pipe dream. This stuff is actually doable. And the sky's the limit with a design equally adept in 3D- and general-purpose tasks.
AMD is already a seasoned vet when it comes to 3D. Now GPGPU is where it's trying to lead development. In order to better facilitate that initiative, the company is throwing its support behind the OpenCL standard as an alternative to Stream and CUDA. As we've seen in several different applications already, when there's a computationally difficult job that can be parallelized, the potential performance gains are well worth optimizing for.
There's also a notable trend toward the adoption of 4K (3840x2160) in the workplace. Those higher resolutions give engineers and artists a lot more room to work with. And while more detail obviously benefits 3D applications, even 2D tasks like programming are greatly enhanced by the extra screen space and pixel density of a 4K display.
Similarly, professional media-oriented titles see a lot of benefit as it becomes possible to edit high-res video in real time at full resolution. A workstation board like the W9100 should speed up the processing of video and photo filters, along with accelerating encoding/decoding.
The workstation graphics card market is clearly changing, and the lines between various segments are getting blurrier, even as the workloads and data sets are more specific than ever. CAD, CAE, M&E, oil and gas...the FirePro W9100 is AMD’s most recent effort to grab a larger share of all of them. But enough background. Let's put this card through its paces.
Right... that's why in real-world functions (rather than "perfect" functions used in benchmarks) the nvidia cards are on par with or even better than the AMD ones... What the author fails to understand is that AMD is the one with sub-par implementation of OpenCL, since half the language is missing in their drivers (and why groups like Blender and Luxrender have to drop support for most things to have the kernel compile properly). Sure the half of the language that is there is fast, but it's like driving a three wheeled ferrari!
If AMD wants to take more market share from NVIDIA, it needs to lower the pricing to appeal to a larger audience and when the IT team is convincing purchasing, 1k isn't much in the long run. They need to drop there price so it's hard to pass up.
I only think AMD really needs to beef up that cooler. A triple slot perhaps? (make the blower two slots). That thermal ceiling is holding a lot back.
perform when using its native CUDA for accelerating relevant tasks vs. the
FirePro using its OpenCL, eg. After Effects. Testing everything using OpenCL
is bound to show the FirePro in a more positive light. Indeed, based on the
raw specs, the W9100 ought to be a lot quicker than it is for some of the tests
(Igor, ask Chris about the AE CUDA test a friend of mine is preparing).
Having said that, the large VRAM should make quite a difference for medical/GIS
and defense imaging, but then we come back to driver reliability which is a huge
issue for such markets (sha7bot is spot on in that regard).
For an English irregular verb "to draw" the perfect tense is "drawn" (and the past is "drew").
For an organization claiming to be professional enough to do a review of a professional grade GPU, simple things like that can take away a lot of credibility.
Then put a box with 8 k6000(8 is the total of cards that the "Nvidia maximum" alow) against 4 w9100(4 is the total of cards that amd said that should put in one system).
Do you think it is fair? From the point of view of a renderfarm owner perhaps, because he dont look at a card but at a solution. Also dont forget that he have to deal with the price(8 $5K($40,000) against 4 $4K($16,000)) maybe he find that the cheaper solution isn't the faster one but maybe faster enough.
But here they put a card against a card. And for me the only way is openCL because it is open. You cant benchmark over a proprietary maner. You must use a tool that both contenders can read.
And yes NVidia dont give a shit to openCL, and i understand why, but i dont think it's wise. time will tell.
> Then put a box with 8 k6000(8 is the total of cards that the "Nvidia maximum" alow) ...
You'd need to use a PCIe splitter to do that. Some people do this for sure, eg. the guy
at the top of the Arion table is using seven Titans, but PCIe splitters are expensive, though
they do offer excellent scalability, in theory up to as many as 56 GPUs per system using
8-way splitters on a 7-slot mbd such as an Asrock X79 Extreme11 or relevant server board.
> Do you think it is fair? ...
Different people would have varying opinions. Some might say the comparison should be based on a fixed
cost basis, others on power consumption or TCO, others on the number of cards, others might say 1 vs. 1
of the best from each vendor. Since uses vary, an array of comparisons can be useful. I value all data points.
Your phrasing suggests I would like to see a test that artifically makes the NVIDIA card look better, which is
nonsense. Rather, atm, there is a glaring lack of real data about how well the same NVIDIA card can run a
particular app which supports both OpenCL and CUDA; if the CUDA performance from such a card is not
sufficiently better than the OpenCL performance for running the same task, then cost/power differences
or other issues vs. AMD cards could mean an AMD solution is more favourable, but without the data one
cannot know for sure. Your preferred scope is narrow to the point of useless in making a proper
> But here they put a card against a card. And for me the only way is openCL because it is open. ...
That's ludicrous. Nobody with an NVIDIA card running After Effects would use OpenCL for GPU acceleration.
> ... You must use a tool that both contenders can read.
Wrong. What matters are the apps people are running. Some of them only use OpenCL, in which case
sure, run OpenCL tests on both cards, I have no problem with that. But where an NVIDIA card can offer
CUDA to a user for an application then that comparison should be included aswell. To not do so is highly
Otherwise, what you're saying is that if you were running AE with a bunch of NVIDIA cards then
you'd try to force them to employ OpenCL, a notion I don't believe for a microsecond.
Now for the apps covered here, I don't know which of them (if any) can make use of CUDA
(my research has been mainly with AE so far), but if any of them can, then CUDA-based
results for the relevant NVIDIA cards should be included, otherwise the results are not a
true picture of available performance to the user.
Atm I'm running my own tests with a K5000, two 6000s, 4000, 2000 and various gamer cards,
exploring CPU/RAM bottlenecks.
Btw, renderfarms are still generally CPU-based, because GPUs have a long way to go before they can
cope with the memory demands of complex scene renders for motion pictures. A friend at SPI told me one
frame can involve as much as 500GB of data, which is fed across their renderfarm via a 10GB/sec SAN. In
this regard, GPU acceleration of rendering is more applicable to small scale work with lesser data/RAM
demands, not for large productions (latency in GPU clusters is a major issue for rendering). The one
exception to this might be to use a shared memory system such as an SGI UV 2 in which latency is no
longer a factor even with a lot of GPUs installed, and at the same time one gains from high CPU/RAM
availability, assuming the available OS platform is suitable (though such systems are expensive).
You saying that the point of view must be based on software people use. Of course i'll make my decision to or not to buy a card on the software i use. I totally agree with you on that (if it's what you mean), but benchmark is another, completely different thing.
"You must use a tool that both contenders can read." isn't a wrong statement. My thing is render so i'll keep on that: I-Ray is a software to render on GPU, but use only cuda (unable to do this benchmark) VRay-RT is another software that can render on cuda and on openCL (still unable to do this benchmark unless you use openCL only).
If you gonna benchmark not the cards, but this two software ok, you can use a Nvidia card and benchmark this two software on cuda, and even that the card can read cuda and openCL, you must not use openCL, because one of the contenders(I-Ray) cannot read openCL.
In other way if you decide to use the software VRay-RT you can use a Nvidia card and benchmark using cuda and openCL to see what is better, but you can't use AMD card on that.
Perhaps, outside of benchmark world of course i can use Nvidia card, AMD card, I-Ray, Vray-RT, whatever i want. But on this review they do benchmark to compare two cards for god's sake.
Benchmark means: a software common to contenders to judge this contenders.
I hope you understand the meaning of my post this time.
In time: i understood your point of view and i agree with that, except benchmark.