Ads
Ads
All about Graphics Cards
 Latest Graphics Cards articles
ATI Radeon HD 5970 2GB: The World's Fastest Graphics Card

ATI Radeon HD 5970 2GB: The World's Fastest Graphics Card
Despite the successful launches of its Radeon HD 5870, 5850, 5770, and 5750 DirectX 11-class graphics cards, there was one title ATI still couldn't claim: the world's fastest discrete board. ATI circles back to take that title with a dual-GPU stunner. Read More

All Graphics Cards articles

Newsletters


  • Ask your question about IT issues
  • Post

Partners

The Games selection

adventure : Scoobydoo: Episode 2 The sequel of Scooby and Sammy's adventures. Same principle as in the previous episode (available on this website). Click on "Instructions" to see...
crazy : Xiao Xiao 7 A great fight scene from the animation movies Xiao Xiao.
Ads

Sponsored links

Nvidia Betting its CUDA GPU Future With 'Fermi'

Next news
11:31 AM - October 1, 2009 by Marcus Yam

This chip is going to be huge for the supercomputing market -- if Nvidia's has its way.

The video card has evolved now to be termed the GPU, thanks to the growing capability of the hardware. Now the GPU is about to take its next big leap to becoming specialized GPGPU (of course, we realize that the term specialized and general purpose are some what contradictory).

Nvidia is betting heavily on GPGPUs becoming a large need in the computing market. While we'll still need our GPUs to push our pixels for our 3D games, Nvidia has just revealed its next-generation CUDA architecture, codenamed "Fermi."

Nvidia bills Fermi as an entirely new ground-up design that will finally realize the potential of GPU computing. Although Nvidia made big steps with its G80 and later the GT200, the graphics maker has made Fermi a much more pleasant and useful tool for programmers.

“The first two generations of the CUDA GPU architecture enabled Nvidia to make real in-roads into the scientific computing space, delivering dramatic performance increases across a broad spectrum of applications,” said Bill Dally, chief scientist at Nvidia.

“It is completely clear that GPUs are now general purpose parallel computing processors with amazing graphics, and not just graphics chips anymore,” said Jen-Hsun Huang, co-founder and CEO of Nvidia. “The Fermi architecture, the integrated tools, libraries and engines are the direct results of the insights we have gained from working with thousands of CUDA developers around the world. We will look back in the coming years and see that Fermi started the new GPU industry.”

At the unveil event, Nvidia did not give anything away in terms of clock speeds or any of the other specifications that hardcore 3D gamers focus on. Instead, it talked about technical features that lend themselves specifically for GPU computing. Such technologies include:

  • C++, complementing existing support for C, Fortran, Java, Python, OpenCL and DirectCompute.
  • ECC, a critical requirement for datacenters and supercomputing centers deploying GPUs on a large scale
  • 512 CUDA Cores featuring the new IEEE 754-2008 floating-point standard, surpassing even the most advanced CPUs
  • 8x the peak double precision arithmetic performance over Nvidia’s last generation GPU. Double precision is critical for high-performance computing (HPC) applications such as linear algebra, numerical simulation, and quantum chemistry
  • Nvidia Parallel DataCache - the world’s first true cache hierarchy in a GPU that speeds up algorithms such as physics solvers, raytracing, and sparse matrix multiplication where data addresses are not known beforehand
  • Nvidia GigaThread Engine with support for concurrent kernel execution, where different kernels of the same application context can execute on the GPU at the same time (eg: PhysX fluid and rigid body solvers)
  • Nexus – the world’s first fully integrated heterogeneous computing application development environment within Microsoft Visual Studio

Oak Ridge National Laboratory (ORNL) has already announced plans for a new supercomputer that will use Fermi to research in areas such as energy and climate change. ORNL’s supercomputer is expected to be 10-times more powerful than today’s fastest supercomputer.

“This would be the first co-processing architecture that Oak Ridge has deployed for open science, and we are extremely excited about the opportunities it creates to solve huge scientific challenges,” Jeff Nichols, ORNL associate lab director for Computing and Computational Sciences said. “With the help of Nvidia technology, Oak Ridge proposes to create a computing platform that will deliver exascale computing within ten years.”

Nvidia did reveal that its upcoming Fermi GPU will pack 3 billion transistors, making it one mammoth chip – bigger than anything from ATI. Of course, the aspirations of Nvidia in the GPU space are far more ambitious than that of AMD. It'll be interesting to see if and how the two head-to-head rivals diverge from the focus on 3D gaming technologies to greater GPGPU application.

Source : Tom's Hardware US

Talkback
Add your comment
Lucuis 10/01/2009 5:55 PM
Hide
-1+

Wow, now that is sweet.

magicandy 10/01/2009 6:03 PM
Hide
-20+

If you're going to put a logo on your chart, common sense states you shouldn't cover up what's on the chart...

Anonymous 10/01/2009 6:04 PM
Hide
-20+

What does that say under the TomsHardware logo in the picture..?

crisisavatar 10/01/2009 6:09 PM
Hide
-17+

The computing capability is great and all but I am personally more interested in affordable GPUs. Let's see if NVIDIA can deliver here.

mlopinto2k1 10/01/2009 6:15 PM
Hide
-2+

Hi, I would like a programmable CPU/GPU/GPGPU unit that allowed Virtual Instruments and Effects to be processed on it. Otherwise, this is just more of the same CRAP!

megamanx00 10/01/2009 6:24 PM
Hide
-9+

That's nice and all, but when are they gonna start selling the darn thing? Besides, even though an evolution of Cuda is nice and everything, proprietary APIs like that are kind of a hard sell. I think it's cool that it will get some C++ support, we'll see how that one goes, but as OpenCL and DirectCompute are more open it will be more important how this chip compares to AMDs in the performance of those rather than CUDA.

jonpaul37 10/01/2009 6:32 PM
Hide
-4+

if the performance/price fit the same shoes as ATI's latest release(s), i will be sold and Nvidia will again be an option in my future. Not to mean it isn't now, i'm just saying, ATI has some nice stuff for a low-ish price.

nforce4max 10/01/2009 6:33 PM
Hide
-1+

Cool how much will it cost? Most likely have to work for a month just to get one at $10 US an hour.

nakecat 10/01/2009 6:50 PM
Hide
--3+

d

Anonymous 10/01/2009 6:50 PM
Hide
-10+

The logo'd out part of the chart reads:

L1 Cache: Configurable 16K or 48K
L2 Cache: 768K
ECC: Yes
Concurrent Kernels: Up to 16

...from another source. Gotta love automated processes like logo stamping :)

nakecat 10/01/2009 6:58 PM
Hide
-4+

Quote :Nvidia did reveal that its upcoming Fermi GPU will pack 3 billion transistors, making it one mammoth chip – bigger than anything from ATI.


Not until the card is out and it's not coming out til first quarter of 2010. Besides, with 5870x2 at the corner and 5770, 5850xx... ATI should still hold the best price / performance value.


http://www.hardwarecanucks.com/new [...] s-surface/

Jenoin 10/01/2009 7:00 PM
Hide
-3+

The Quadros and the Tesla product lines have always been based off the Geforce line. Is this a turnaround? Are they going to design for the Tesla line and then remove features for the Quadro and Geforce? I hope the pricing of these isn't going to reflect all the capabilities this chip has that will be completely unused by the majority of geforce owners.(other than Folding@home)

VioMeTriX 10/01/2009 7:00 PM
Hide
-0+

wow the money in my pocket is getting really hot

dreamer77dd 10/01/2009 7:03 PM
Hide
-5+

If it does not bottleneck it's self and can manage data flow it could do well. "Just one DVI output - keep in mind this is NOT a gaming card, but the Tesla model for super computing. " I would like to put a card in that takes care of everything else in the background of my computer that bogs the cpu. If it helps with programming languages perform better like C, Java, Python, OpenCL and DirectCompute why not but it have to be more then 10% increase for me to be interested.

dreamer77dd 10/01/2009 7:06 PM
Hide
-1+

I use to see tests with 4 gpus what happen to those days? I still would love to rip threw games and have no game bring me to my knees. I am not sure if motherboards have enough bus to take advantage of this. hmm?

njkid3 10/01/2009 7:18 PM
Hide
-1+

well nice looking gpu but with its delayed entry, focus on computing rather than gaming, and the high possibility that it will be pretty high on the cost scale. i would have to say the odds that they will one up ATI are slim. just due to the fact that their chips are already out, they are priced reasonably and their chips have already shown to haul serious ass in gaming, and with their soon to be full line of dx 11 products covering all price levels of the market i would be surprised if nvidia can pull this one out of their hats.

omnimodis78 10/01/2009 7:32 PM
Hide
--1+

I think it's safe to assume that if it has the stated capabilities then it really won't have any issues at all playing games, even the next-gen stuff. nVidia would be insane to sell a card in the consumer market without making it a kick-ass gaming beast, or the reviews would tear it apart and within a month all gamers would be buying ATI, and we know they would capitalize on that shift so much that it would force nVidia to go in crisis mode! No need to worry, these card will be premium gaming cards, with the added benefit of an expanded potential. I HOPE!

yang 10/01/2009 7:39 PM
Hide
-1+

...will this run crysis? :)

wildwell 10/01/2009 7:40 PM
Hide
-0+

Ahh... as technology marches on. It is odd that Tom's put their logo on top of not just the chart, it's over the part of the chart showing info on the new GPU!

Gin Fushicho 10/01/2009 8:13 PM
Hide
-0+

Woah.... Thats insane. wish I programmed now.

joeman42 10/01/2009 8:27 PM
Hide
-3+

Yang :
...will this run crysis?


No, because it uses havok instead of physx. For the same reason Nvidia sabotaged ATI on physics it will be unable to process the effects in Cryengine. The silicon that could have been used to accelerate it (like ATI did) will be wasted idling. Larrabe, on the other hand, being a collection of x86 engines can be reconfigured on the fly to use all of its muscle on each game. Nvidia is going on a proprietary limb which may very well be an evolutionary dead end.

hannibal 10/01/2009 8:31 PM
Hide
-0+

A guite a monster for GPU... Great for folding@home and PhysX I supose.
And don't worry, with that kind of compute power it is ok allso in gaming ;-)
The other thing is completely... is this more sensible than ATI 58XX series for games. With 3.0 billion transistors, this is not going to be a "cheap" alternative to 5870. The Nvidia is going to take the fastest card tittle at any cost. But even GPU genre needs to have it's Ferrari. It may not be sensible to normal use, but it can be fun if you have the money for it.
This seems to be reasonable modular, so there is some hope for new generation Nvidia middle range cards... ATI is doing a great job in that aspect, we need some competition also there. If Nvidia makes (again) facelift for G80 in the low and midlle range, the graphic development can stagnage greatly.

hannibal 10/01/2009 8:54 PM
Hide
-0+

One nice article from Anandtech:
http://www.anandtech.com/video/showdoc.aspx?i=3651

hannibal 10/01/2009 9:14 PM
Hide
-1+

From Anandtech article

Quote :Correct answer isn't to target a lower price point first, but rather build big chips efficiently. And build them so that you can scale to different sizes/configurations without having to redo a bunch of stuff.


So this seems to be scale able, so we really can see competition allso in not so high end segments. But how soon?

Allso interesting is that ATI use smaller chips and use more of them for more power. Nvidia makes one big chip and reduce SM units or something like that to make smaller chip for cheaper prize range.
All in all it's good that Nvidia has a plan and there is going to be some real competition next year!
Maybe cheaper 58xx cards?

__-_-_-__ 10/01/2009 9:25 PM
Hide
-1+

ATI stream ftw.

ravewulf 10/01/2009 9:54 PM
Hide
-1+

Yes that's really nice for scientific purposes and stuff. Now where's your DX11 GPU and new motherboard chipsets?

I want to see if they can come up with something interesting enough to make them my next choice, otherwise I'll definitely be moving over to AMD/ATI.

Antilycus 10/01/2009 10:34 PM
Hide
--1+

if NVDA can provide the floating point and processing power required to render in REAL TIME (which is a feat in itself since one realistic image (1 frame, of 60 in 1 second) takes 23 hours on a quad core CPU.) They could take a chunk from intel and amd. I'd be all over support NVDA, but I have been since their first 3D Accelerator (i believ they were diamond at the time, or bought diamond at the time)

shadow703793 10/01/2009 10:42 PM
Hide
-0+

Antilycus :
if NVDA can provide the floating point and processing power required to render in REAL TIME (which is a feat in itself since one realistic image (1 frame, of 60 in 1 second) takes 23 hours on a quad core CPU.) They could take a chunk from intel and amd. I'd be all over support NVDA, but I have been since their first 3D Accelerator (i believ they were diamond at the time, or bought diamond at the time)


Hell, if this happened nVidia would have a MASSIVE advantage. I'm sure many pro 3D designers will be willing to shell out $5k+ for a card like this considering the time saved in the long run.

Then again, this probably won't happen very soon. Also I'm assuming it will be a true ray traced rendering.

saint19 10/01/2009 11:17 PM
Hide
--1+

This sounds like the new line of nVidia GPU to compete with the new HD 58xx....

aungee 10/02/2009 12:29 PM
Hide
-0+

I think this is a logical step by Nvidia to head in the GPGPU direction if it's to survive. Hats off to them if they do release the product.

But rest assure AMD/ATI will come out with a GPU in Q2/Q3 2010 with similar features. You never know, the GT300 might not make out to the market by then :-)


Sponsored links

Related articles

  • The two worlds remained separate for a long time. We used the CPU (or several CPUs) for office and Internet applications and GPUs were good only for drawing pretty pictures faster. But a single event would change all that: the appearance of programmability in GPUs. At first, CPUs had nothing to fear. The first so-called programmable GPUs (the NV20 and R200) were far from being a threat. The number of instructions for a program remained limited to around 10, and they worked on exotic data types like nine- or 12-bit fixed-point numbers. But Moore’s Law rears its head once again. Not only does the increase in the number of transistors make it possible to increase the number of calculating units, but it also increases their flexibility. So, the appearance of the NV30 was significant for several reasons. While gamers may not induct the NV30 into their hall of fame, it did usher in two factors that were important in changing the mindset that sees GPUs as nothing more than graphics accelerators: support for single-precision floating-point calculations (even if it didn’t comply with the IEEE754 standard);support for a number of instructions in excess of a thousand. At this point, all the conditions were in place to attract a few curious researchers on the lookout for ways to wring out more processing power.

  • Nvidia introduced CUDA with the release of the GeForce 8800. At that time the promises they were making were extremely seductive, but we kept our enthusiasm in check. After all, wasn’t this likely to be just a way of staking out the territory and surfing the GPGPU wave? Without an SDK available, you can’t blame us for thinking it was all just a marketing operation and that nothing really concrete would come of it. It wouldn’t be the first time a good initiative has been announced too early and never really saw the light of day due to a lack of resources – especially in such a competitive sector. Now, a year and a half after the announcement, we can say that Nvidia has kept its word. Not only was the SDK available quickly in a beta version, in early 2007, but it’s also been updated frequently, proving the importance of this project for Nvidia. Today CUDA has developed nicely; the SDK is available in a beta 2.0 version for the major operating systems (Windows XP and Vista and Linux and 1.1 for Mac OS X), and Nvidia is devoting an entire section of its site for developers. On a more personal level, the impression we got from our first steps with CUDA was extremely positive. Even if you’re familiar with the GPU’s architecture, it’s natural to be apprehensive about programming it, and while the API looks clear at first glance you can’t keep from thinking it won’t be easy to get convincing results with the architecture. Won’t the gain in processing time be siphoned off by the multiple CPU-GPU transfers? And how to make good use of those thousands of threads with almost no synchronization primitive? We started our experimentation with all these uncertainties in mind. But they soon evaporated when the first version of our algorithm, trivial as it was, already proved to be significantly faster than the CPU implementation. So, CUDA is not a gimmick intended for researchers who want to cajole their university into buying them a GeForce. CUDA is genuinely usable by any programmer who knows C, provided he or she is ready to make a small investment of time and effort to adapt to this new programming paradigm. That effort won’t be wasted provided your algorithms lend themselves to parallelization. We should also tip our hat to Nvidia for providing ample, quality documentation to answer all the questions of beginning programmers. For the latest on CUDA click here.

  • However we did decide to measure the processing time to see if there was any advantage to using CUDA even with our crude implementation, or on the other hand if was going to take long, exhaustive practice to get any real control over the use of the GPU. The test machine was our development box – a laptop computer with a Core 2 Duo T5450 and a GeForce 8600M GT, operating under Vista. It’s far from being a supercomputer, but the results are interesting since our test is not all that favorable to the GPU. It’s fine for Nvidia to show us huge accelerations on systems equipped with monster GPUs and enormous bandwidth, but in practice many of the 70 million CUDA GPUs existing on current PCs are much less powerful, and so our test is quite germane. The results we got are as follows for processing a 2048x2048 image: CPU 1 thread: 1419 msCPU 2 threads: 749 msCPU 4 threads: 593 ms GPU (8600M GT) blocks of 256 pixels: 109 msGPU (8600M GT) blocks of 128 pixels: 94 msGPU (8800 GTX) blocks of 128 pixels / 256 pixels: 31 ms Several observations can be made about these results. First of all you’ll notice that despite our crack about programmers’ laziness, we did modify the initial CPU implementation by threading it. As we said, the code is ideal for this situation – all you do is break down the initial image into as many zones as there are threads. Note that we got an almost linear acceleration going from one to two threads on our dual-core CPU, which shows the strongly parallel nature of our test program. Fairly unexpectedly, the four-thread version proved faster, whereas we were expecting to see no difference at all on our processor, or even – and more logically – a slight loss of efficiency due to the additional cost generated by the creation of the additional threads. What explains that result? It’s hard to say, but it may be that the Windows thread scheduler has something to do with it; but in any case the result was reproducible. With a texture with smaller dimensions (512x512), the gain achieved by threading was a lot less marked (approximately 35% as opposed to 100%) and the behavior of the four-thread version was more logical, showing no gain over the two-thread version. The GPU was still faster, but less markedly so (the 8600M GT was three times faster than the two-thread version). The second notable observation is that even the slowest GPU implementation was nearly six times faster than the best-performing CPU version. For a first program and a trivial version of the algorithm, that’s very encouraging. Notice also that we got significantly better results using smaller blocks, whereas intuitively you might think that the reverse would be true. The explanation is simple – our program uses 14 registers per thread, and with 256-thread blocks it would need 3,584 registers per block, and to saturate a multiprocessor it takes 768 threads, as we saw. In our case, that’s three blocks or 10,572 registers. But a multiprocessor has only 8,192 registers, so it can only keep two blocks active. Conversely, with blocks of 128 pixels, we need 1,792 registers per block; 8,192 divided by 1,792 and rounded to the nearest integer works out to four blocks being processed. In practice, the number of threads are the same (512 per multiprocessor, whereas theoretically it takes 768 to saturate it), but having more blocks gives the GPU additional flexibility with memory access – when an operation with a long latency is executed, it can launch execution of the instructions on another block while waiting for the results to be available. Four blocks would certainly mask the latency better, especially since our program makes several memory accesses.