Sign in with
Sign up | Sign in

SPECviewperf 12: Showcase, Siemens NX, And SolidWorks

AMD FirePro W9100 Review: Hawaii Puts On Its Suit And Tie
By

Showcase 2013

The FirePro W9100 places a close second after the Quadro K6000. Both cards dominate the rest of the field. Interestingly, Showcase 2013 is one of the very few professional apps completely based on DirectX.

Siemens NX 8.0

Again, Nvidia's flagship and AMD's FirePro W9100 dominate with massive leads over the rest of the field (though the Quadro K6000 has an advantage).

SolidWorks 2013 SP1

The Quadro K6000 offers massive performance at a similarly hefty price. AMD's FirePro W9100 barely manages to surpass its older W9000, but fails to beat Nvidia's Quadro K5000. As you can imagine, then, the performance difference between the FirePro W9100 and Quadro K6000 is humbling, yielding another situation where AMD’s driver team has work cut out for it.

Display all 17 comments.
  • 4 Hide
    CodeMatias , May 19, 2014 6:48 AM
    "Nvidia’s sub-par OpenCL implementation"

    Right... that's why in real-world functions (rather than "perfect" functions used in benchmarks) the nvidia cards are on par with or even better than the AMD ones... What the author fails to understand is that AMD is the one with sub-par implementation of OpenCL, since half the language is missing in their drivers (and why groups like Blender and Luxrender have to drop support for most things to have the kernel compile properly). Sure the half of the language that is there is fast, but it's like driving a three wheeled ferrari!
  • -2 Hide
    Kekoh , May 19, 2014 3:31 PM
    I'll be honest, I don't know anything about workstation graphics. I read this purely for knowledge. That being said, I can't help but pick up on the AMD bias in this article.
  • 0 Hide
    sha7bot , May 19, 2014 6:28 PM
    Amazing card, but I disagree with your thoughts on the price. Anyone in this segment will drop another 1k for NVIDIA's consistent reliability.

    If AMD wants to take more market share from NVIDIA, it needs to lower the pricing to appeal to a larger audience and when the IT team is convincing purchasing, 1k isn't much in the long run. They need to drop there price so it's hard to pass up.
  • 1 Hide
    Shankovich , May 19, 2014 7:23 PM
    A great card to be honest. I had one sent to me by AMD and I've been tinkering with it today to run CFD software, along with some CFD code. It really sped things up a lot! Though the drivers need work however.

    I only think AMD really needs to beef up that cooler. A triple slot perhaps? (make the blower two slots). That thermal ceiling is holding a lot back.
  • 1 Hide
    Jarmen Kell , May 19, 2014 7:58 PM
    with this performance the W9100 really has a great value, some tests feel's like driving a fast four wheeled fully opencl accelerated Mclaren F1,nice review.
  • 1 Hide
    mapesdhs , May 19, 2014 8:12 PM

    The picture is incomplete though without comparing to how the Quadro would
    perform when using its native CUDA for accelerating relevant tasks vs. the
    FirePro using its OpenCL, eg. After Effects. Testing everything using OpenCL
    is bound to show the FirePro in a more positive light. Indeed, based on the
    raw specs, the W9100 ought to be a lot quicker than it is for some of the tests
    (Igor, ask Chris about the AE CUDA test a friend of mine is preparing).

    Having said that, the large VRAM should make quite a difference for medical/GIS
    and defense imaging, but then we come back to driver reliability which is a huge
    issue for such markets (sha7bot is spot on in that regard).

    Ian.



  • 1 Hide
    tourist , May 20, 2014 11:09 AM
    How about tracking down some firepro apu"s ?
  • 0 Hide
    wekilledkenny , May 22, 2014 5:06 PM
    WTH is "Drawed Objects"? Even a rudimentary spell-check can catch this.

    For an English irregular verb "to draw" the perfect tense is "drawn" (and the past is "drew").
    For an organization claiming to be professional enough to do a review of a professional grade GPU, simple things like that can take away a lot of credibility.
  • 0 Hide
    Marcelo Viana , May 25, 2014 10:06 PM
    Quote:

    The picture is incomplete though without comparing to how the Quadro would
    perform when using its native CUDA for accelerating relevant tasks vs. the
    FirePro using its OpenCL, eg. After Effects. Testing everything using OpenCL
    is bound to show the FirePro in a more positive light. Indeed, based on the
    raw specs, the W9100 ought to be a lot quicker than it is for some of the tests
    (Igor, ask Chris about the AE CUDA test a friend of mine is preparing).

    Having said that, the large VRAM should make quite a difference for medical/GIS
    and defense imaging, but then we come back to driver reliability which is a huge
    issue for such markets (sha7bot is spot on in that regard).

    Ian.





    Then put a box with 8 k6000(8 is the total of cards that the "Nvidia maximum" alow) against 4 w9100(4 is the total of cards that amd said that should put in one system).

    Do you think it is fair? From the point of view of a renderfarm owner perhaps, because he dont look at a card but at a solution. Also dont forget that he have to deal with the price(8 $5K($40,000) against 4 $4K($16,000)) maybe he find that the cheaper solution isn't the faster one but maybe faster enough.

    But here they put a card against a card. And for me the only way is openCL because it is open. You cant benchmark over a proprietary maner. You must use a tool that both contenders can read.
    And yes NVidia dont give a shit to openCL, and i understand why, but i dont think it's wise. time will tell.
  • 2 Hide
    mapesdhs , May 26, 2014 4:04 PM

    Marcelo Viana writes:
    > Then put a box with 8 k6000(8 is the total of cards that the "Nvidia maximum" alow) ...

    You'd need to use a PCIe splitter to do that. Some people do this for sure, eg. the guy
    at the top of the Arion table is using seven Titans, but PCIe splitters are expensive, though
    they do offer excellent scalability, in theory up to as many as 56 GPUs per system using
    8-way splitters on a 7-slot mbd such as an Asrock X79 Extreme11 or relevant server board.


    > Do you think it is fair? ...

    Different people would have varying opinions. Some might say the comparison should be based on a fixed
    cost basis, others on power consumption or TCO, others on the number of cards, others might say 1 vs. 1
    of the best from each vendor. Since uses vary, an array of comparisons can be useful. I value all data points.
    Your phrasing suggests I would like to see a test that artifically makes the NVIDIA card look better, which is
    nonsense. Rather, atm, there is a glaring lack of real data about how well the same NVIDIA card can run a
    particular app which supports both OpenCL and CUDA; if the CUDA performance from such a card is not
    sufficiently better than the OpenCL performance for running the same task, then cost/power differences
    or other issues vs. AMD cards could mean an AMD solution is more favourable, but without the data one
    cannot know for sure. Your preferred scope is narrow to the point of useless in making a proper
    purchasing decision.


    > But here they put a card against a card. And for me the only way is openCL because it is open. ...

    That's ludicrous. Nobody with an NVIDIA card running After Effects would use OpenCL for GPU acceleration.


    > ... You must use a tool that both contenders can read.

    Wrong. What matters are the apps people are running. Some of them only use OpenCL, in which case
    sure, run OpenCL tests on both cards, I have no problem with that. But where an NVIDIA card can offer
    CUDA to a user for an application then that comparison should be included aswell. To not do so is highly
    misleading.

    Otherwise, what you're saying is that if you were running AE with a bunch of NVIDIA cards then
    you'd try to force them to employ OpenCL, a notion I don't believe for a microsecond.

    Now for the apps covered here, I don't know which of them (if any) can make use of CUDA
    (my research has been mainly with AE so far), but if any of them can, then CUDA-based
    results for the relevant NVIDIA cards should be included, otherwise the results are not a
    true picture of available performance to the user.

    Atm I'm running my own tests with a K5000, two 6000s, 4000, 2000 and various gamer cards,
    exploring CPU/RAM bottlenecks.


    Btw, renderfarms are still generally CPU-based, because GPUs have a long way to go before they can
    cope with the memory demands of complex scene renders for motion pictures. A friend at SPI told me one
    frame can involve as much as 500GB of data, which is fed across their renderfarm via a 10GB/sec SAN. In
    this regard, GPU acceleration of rendering is more applicable to small scale work with lesser data/RAM
    demands, not for large productions (latency in GPU clusters is a major issue for rendering). The one
    exception to this might be to use a shared memory system such as an SGI UV 2 in which latency is no
    longer a factor even with a lot of GPUs installed, and at the same time one gains from high CPU/RAM
    availability, assuming the available OS platform is suitable (though such systems are expensive).

    Ian.

  • 0 Hide
    Marcelo Viana , May 27, 2014 12:16 PM
    good answer mapesdhs, and i agree with almost everything you posted, but yet i think you didn't got what i meant to explain in my replay.
    You saying that the point of view must be based on software people use. Of course i'll make my decision to or not to buy a card on the software i use. I totally agree with you on that (if it's what you mean), but benchmark is another, completely different thing.

    "You must use a tool that both contenders can read." isn't a wrong statement. My thing is render so i'll keep on that: I-Ray is a software to render on GPU, but use only cuda (unable to do this benchmark) VRay-RT is another software that can render on cuda and on openCL (still unable to do this benchmark unless you use openCL only).
    If you gonna benchmark not the cards, but this two software ok, you can use a Nvidia card and benchmark this two software on cuda, and even that the card can read cuda and openCL, you must not use openCL, because one of the contenders(I-Ray) cannot read openCL.
    In other way if you decide to use the software VRay-RT you can use a Nvidia card and benchmark using cuda and openCL to see what is better, but you can't use AMD card on that.

    Perhaps, outside of benchmark world of course i can use Nvidia card, AMD card, I-Ray, Vray-RT, whatever i want. But on this review they do benchmark to compare two cards for god's sake.
    Benchmark means: a software common to contenders to judge this contenders.

    I hope you understand the meaning of my post this time.
    In time: i understood your point of view and i agree with that, except benchmark.
  • 0 Hide
    Marcelo Viana , May 27, 2014 12:26 PM
    good answer mapesdhs, and i agree with almost everything you posted, but yet i think you didn't got what i meant to explain in my replay.
    You saying that the point of view must be based on software people use. Of course i'll make my decision to or not to buy a card on the software i use. I totally agree with you on that (if it's what you mean), but benchmark is another, completely different thing.

    "You must use a tool that both contenders can read." isn't a wrong statement. My thing is render so i'll keep on that: I-Ray is a software to render on GPU, but use only cuda (unable to do this benchmark) VRay-RT is another software that can render on cuda and on openCL (still unable to do this benchmark unless you use openCL only).
    If you gonna benchmark not the cards, but this two software ok, you can use a Nvidia card and benchmark this two software on cuda, and even that the card can read cuda and openCL, you must not use openCL, because one of the contenders(I-Ray) cannot read openCL.
    In other way if you decide to use the software VRay-RT you can use a Nvidia card and benchmark using cuda and openCL to see what is better, but you can't use AMD card on that.

    Perhaps, outside of benchmark world of course i can use Nvidia card, AMD card, I-Ray, Vray-RT, whatever i want. But on this review they do benchmark to compare two cards for god's sake.
    Benchmark means: a software common to contenders to judge this contenders.

    I hope you understand the meaning of my post this time.
    In time: i understood your point of view and i agree with that, except benchmark.
  • 1 Hide
    mapesdhs , May 27, 2014 3:49 PM
    Marcelo Viana writes:
    > You saying that the point of view must be based on software people use. ...

    From the point of view of making a purchasing decision, yes, but I understand
    the appeal of general benchmarking for its own sake, I do a lot of that myself.
    Every data point helps. I don't agree with restricting the scope of a test
    though just because not all contenders support a particular function or feature.
    It's been common practice for years for sites to present benchmark results which
    are only relevant to one particular type of product, be it a GPU type, CPU or
    mbd issue, etc. Otherwise it'd be like saying that a mbd review shouldn't include
    any USB3 results if even just one mbd in the lineup didn't have USB3 functionality;
    people would still want to know how it fares on the boards which do though, and
    that's what I'm getting at: I'd like to know how NVIDIA cards perform where it's
    possible to use CUDA instead of OpenCL for those tasks which can use both.
    Perfectly reasonable expectation IMO. Recent reviews looking at Mantle are a good
    example; nobody would suggest that comparisons to NVIDIA cards shouldn't be done
    merely because it's something NVIDIA cards don't support.

    Not including CUDA results just because AMD cards don't support it is madness.


    > If you gonna benchmark not the cards, but this two software ok, you can use

    My point is simply this: if a card supports both APIs, then results for both
    should be given, otherwise any conclusion is at best misleading or at worst
    may be just plain wrong.


    > In other way if you decide to use the software VRay-RT you can use a Nvidia
    > card and benchmark using cuda and openCL to see what is better, but you can't
    > use AMD card on that.

    Of course, it depends on the task. That's what I meant about it being application
    specific. However, this article allows one to infer conclusions about the cards
    being tested which may be completely wrong for some other task. Without even one
    example to which one can compare, how can one know? I still don't know how any
    particular NVIDIA card performs for the same task when using OpenCL vs. CUDA,
    because sites don't test it, which is annoying. All this article allows me to
    infer with any certainty is that, on purely performance grounds, an NVIDIA card
    is generally not the best option for OpenCL, but then that's been known for
    years now, it's not new information. Dozens of existing reviews show this again
    and again, but it's not really all that useful for someone who has an NVIDIA
    card and is using it for a task that can use CUDA, such as AE. Indeed, this is
    the perfect example: if someone is running AE on a system with a couple of
    780Tis (CUDA-based RayTrace3D function), would the rendering be faster with an
    OpenCL-based W9100? Nothing in this article helps one answer this question.

    For the AMD cards, I can only come away with the same opinion I have for most
    previous releases, namely that they're not as fast as on-paper specs would suggest.


    NOTE: Viewperf 12 is bottlenecked by CPU power, in which case the true potential
    of all the cards might be unrealised with just a stock speed 4930K (hard to
    know if the subtests would use more than 6 cores if available). It would be wise
    to run the tests again with a proper dual-socket XEON system, see what happens,
    or at least compare to the 4930K running at 4.8. Maybe the tests are clock-limited,
    but atm we're running blind. I've been testing with a 5GHz 2700K, will test soon
    with a 3930K (stock vs. oc'd) and other configs (dual-XEON X5570 Dell T7500). See:

    http://www.sgidepot.co.uk/misc/viewperf.txt


    To the author: there's a typo in both SiSoft diagrams on page 5, it says Sabdra
    instead of Sandra.

    Ian.

  • 0 Hide
    Marcelo Viana , May 27, 2014 9:21 PM
    Well mapesdhs, one can do what you propose. For example: using Vray-RT and see how long it take to render a scene on cuda(K6000) and see how long it take to do the same on openCL(W9100). It won't be a benchmark but will give some numbers(results). And that numbers could be very dangerous for normal readers.

    Render is my thing, i have much knowledge about this specific task. It means that i can read this numbers in a very mature way that on without that knowledge can.
    If a Nvidia card show a time of 1minute when AMD card show a time of 10minutes to do the same task, even so i can't claim Nvidia faster. My knowledge lead me to think that perhaps a software is tuned to use all the resource of cuda and isn't mature enough to use all the resources of openCL for example. It should not happens in a benchmark because the code will be the same for both.
    See how dangerous is a test like this?

    I still figuring out why most of the sites i read claim the W9100 a clear winner when here it is not.
    As a example in this sites K6000 leads on Catia but on little margin, here the same result but by a large margin.
    In other sites almost every one claim W9100 leading by large margin on Maya and Solidworks, but here Maya lead by narrow margin and solidworks is leading by K6000.
    Isn’t easy to read a benchmark, imagine without it?
    But it of course could be done to help a fill that are able to read the test in a professional way.
    Cheers.
  • 0 Hide
    mapesdhs , May 29, 2014 12:25 PM
    Marcelo Viana writes:
    > Well mapesdhs, one can do what you propose. For example: using Vray-RT and see how long it take to render
    > a scene on cuda(K6000) and see how long it take to do the same on openCL(W9100). ...

    That's not really what I want to know. :D  I'd want to know how it compares for CUDA vs. OpenCL on just the K6000.
    That will show whether a CUDA implementation of the same problem, at least for that particular application, is more
    effective, and that helps make better use of knowing how the same OpenCL test runs on an AMD card.

    Btw, how a card processes OpenCL may not be the same at all between different brands, plus of course
    OpenCL isn't fully implemented on consumer cards.

    "Dangerous" isn't relevant. I seek information I'm not being given, without which one cannot form a full picture of
    what is going on.

    Note that I'm perfectly familiar with rendering concepts aswell; check my SGI site. ;)  I've been doing benchmark
    research of all kinds for more than 20 years, eg. here's some of my old work from way back:

    http://www.sgidepot.co.uk/r10kcomp.html


    As for CATIA, see my page ref, many of these tests are CPU limited, so it could be that a stock 4930K
    is holding both cards back, hard to say without further tests. A 5GHz 2700K definitely bottlenecks
    many of the tests, especially Energy & Medical.

    Ian.

  • 0 Hide
    Marcelo Viana , May 29, 2014 4:19 PM
    mapesdhs writes:
    “"Dangerous" isn't relevant.”
    It's relevant for common readers, and this site is full of. But of course you clear are not.

    Very nice site mapesdhs, lots of information, thank you for sharing.

    As for Catia i totally agree with you something must holding back. Whish i could have the cards to do it myself.
    But i have to confess i don't know yet how to make the tests in a way to avoid the limitations that others did like you have pointed.
    mapesdhs writes:
    “That's not really what I want to know. I'd want to know how it compares for CUDA vs. OpenCL on just the K6000.
    That will show whether a CUDA implementation of the same problem, at least for that particular application, is more
    effective, and that helps make better use of knowing how the same OpenCL test runs on an AMD card.”

    Very interesting point, and at least a diferent way to go. “Dig, dig, dig is always the way”.

    For me, with all the informations that i get so far (not clear, just a opinion) The W9100 is the best card, more memory, more flops etc... but in a real word, i'll go for K6000. Not for the card itself but for the cuda. AMD must release a driver that really support openCL in they cards on windows and (in my case) linux.
    Other way if openCL 2 was already here (full work not bug drives) i have no doubt to go for W9100, best card, more memory, better price.
    As i said just a opinion.
  • 0 Hide
    mapesdhs , May 30, 2014 8:10 AM

    Marcelo Viana writes:
    > It's relevant for common readers, and this site is full of. But of course you clear are not.

    Careful, if my head gets any bigger it'll need its own post code... :}


    > Very nice site mapesdhs, lots of information, thank you for sharing.

    Thanks! Most welcome.


    > As for Catia i totally agree with you something must holding back. Whish i could have the cards to do it myself.

    Hopefully I'll be able to work out something about this when I've tested
    using my other systems, though not an ideal comparison - for that I'd need
    a newer 2-socket XEON system using two CPUs with 6+ cores each (can't see
    that happening any time soon, too expensive; it was painful enough
    building up the Dell T7500 from scratch).


    > Very interesting point, and at least a diferent way to go. “Dig, dig, dig is always the way”.

    Thing is though, even if I did find out what I mentioned above, the
    conclusion may only be valid for that particular application and that
    particular GPU. Let's say just as an example that the CUDA version of a
    task is 30% quicker than an OpenCL implementation of the same task; can
    one infer from this that a CUDA version of any task will be 30% faster?
    Certainly not. Alas, though it would be great to have a range of data
    points on this, atm there isn't even one example one can examine to gain
    some insight into any API efficiency differences. There's an assumption
    (and maybe it's valid) that given the option, CUDA is the better choice,
    but where's the data?

    I'm a firm believer in the ethos espoused by David Kirkaldy, a man who
    administered a15m-long 115 tonne testing machine in London in the late
    1800s (eg. he was asked by the govt. to investigate structural problems
    with parts recovered from the Tay Bridge disaster in 1879). An engraved
    stone tablet above his works' entrance read, "Facts Not Opinions", an
    attitude which annoyed many of his peers.


    > ... The W9100 is the best card, more memory, more flops etc...

    The memory angle presents a problem, something I was moaning about to a
    friend this week.

    Tasks such as volumetric imaging (medical), GIS, defense imaging, etc.
    need a lot of main RAM and clearly will be much more effective if the GPU
    has lots of RAM too, but the two Viewperf12 examples show that the real
    apps used for such tasks put considerable demands on the main CPU(s)
    aswell, ie. in this case the single 4930K appears to be a bottleneck. If
    so, then the potential performance of a card like the W9100 is being
    wasted because the host system doesn't have the compute power to feed it
    properly (same concept as there being no point putting a more powerful GPU
    in the Red Devil budget gaming build presented on toms this week, because
    the main CPU couldn't exploit it). SGI did a lot of work on these issues
    20 years ago - it's why some Onyx setups needed 8+ even if only one gfx
    pipe was present, because the application needed a lot of preprocessing
    power, eg. the real-time visualisation of an oil rig database is a good
    example I recall (this image dates from the early/mid-1990s):

    http://www.sgidepot.co.uk/misc/oilrig.jpg

    The proprietary oil rig data was converted to IRIS Performer for every
    frame using various culling methods, giving a 10Hz update rate on Onyx
    RE2, 60Hz on Onyx2 IR. The system was creating an entirely new scene graph
    for every frame.


    Or to put it another way, there's not much point in a card having as much
    RAM as the W9100 if the system doesn't have the CPU power to drive it
    properly. The 'problem' is the slow-as-mud improvements Intel has been
    making with its CPUs in recent years. Although they've added more cores to
    the XEON lines, some tasks need higher single-core performance, not more
    cores, especially those which aren't coded to use more than 6 cores (this
    has been especially painful for those still using ProE). IBM sorted this
    out years ago (some very high clocks present in their Power CPUs), so why
    hasn't Intel? With the same higher thermal limits followed, they ought to
    be able to offer CPUs with 4 to 8 cores by now at much higher clock speeds
    than are currently available. Instead, they've gone crazy with many-cores
    options, but the clock rates are too low. I'm certain that many pro users
    would love the option of having 4.5GHz+ CPUs with only 4 to 8 cores max.
    Such a config would speed up the typical single-thread GUI used by most
    pro apps aswell.


    > but in a real word, i'll go for K6000. Not for the card itself but for the cuda. AMD must release
    > a driver that really support openCL in they cards on windows and (in my case) linux.

    Me too, though also for the driver reliability. Not that NVIDIA is immune
    to driver screwups, but I've had far fewer problems with NVIDIA drivers in
    general. I agree with an earlier poster who said the W9100 needs to be a
    lot cheaper than the K6000 to draw away those who might otherwise buy the
    latter despite the moderate price difference. In many pro markets, it's
    worth spending disproportionately more in order to achieve significantly
    greater reliability. I talked to someone yesterday who told me a full
    render of a movie they're working on is going to take about a week on
    their GPU cluster, so it's obviously important that during all that time
    the GPU drivers don't do anything dumb, otherwise it's a heck of a lot of
    wasted power, delay, etc.


    > Other way if openCL 2 was already here (full work not bug drives) i have no doubt to go
    > for W9100, best card, more memory, better price.

    In a way it reminds me a bit of what used to happen with pro cards 10+
    years ago, ie. vendors never developed the drivers to get the best out of
    a product before they moved on to the next product (SGI's approach was
    very different, but costly). Someone in a position to know told me at the
    time that few cards end up offering more than about a third of what they
    could really do before optimisation work on the card is halted to allow
    vendor staff to move on to the next product release. And it doesn't help
    that sometimes driver updates can completely ruin the performance of an
    older card, eg. way back in the early 200-series NV drivers, DRV-09
    performance (Viewperf 7.1.1) was really good (check the numbers on my site
    for a Quadro 600 with a simple oc'd i3 550), but then at some point an
    update made it collapse completely. See my Viewperf page for details (last
    table on the page).


    In short then, more data please! 8)

    Ian.

React To This Article