Nvidia: Moore's Law is Dead, Multi-core Not Future
Nvidia chief scientist says that everyone needs to rethink processors in order for Moore's Law to continue.
Bill Dally, the chief scientist and senior vice president of research at Nvidia, wrote an article for Forbes purporting that Moore's Law, the theory that transistor count and performance would double every 18 months, is dead.
The problem, according to Dally's paper on Forbes, is that current CPU architectures are still serial processors, while he believes that the future is in parallel processing. He gives the example of reading an essay, where a single reader can only read one word at a time – but having a group of readers assigned to a paragraph each would greatly accelerate the process.
"To continue scaling computer performance, it is essential that we build parallel machines using cores optimized for energy efficiency, not serial performance. Building a parallel computer by connecting two to 12 conventional CPUs optimized for serial performance, an approach often called multi-core, will not work," he wrote. "This approach is analogous to trying to build an airplane by putting wings on a train. Conventional serial CPUs are simply too heavy (consume too much energy per instruction) to fly on parallel programs and to continue historic scaling of performance."
"Going forward, the critical need is to build energy-efficient parallel computers, sometimes called throughput computers, in which many processing cores, each optimized for efficiency, not serial speed, work together on the solution of a problem. A fundamental advantage of parallel computers is that they efficiently turn more transistors into more performance," Dally added.
Dally also posed that focusing on parallel computing architectures will help resurrect Moore's law, "Doubling the number of processors causes many programs to go twice as fast. In contrast, doubling the number of transistors in a serial CPU results in a very modest increase in performance--at a tremendous expense in energy."
One big driver of the current processor design are the programs written to run on current chips. Dally said that the long-standing, 40-year-old serial programming practices are ones that will be hard to change, and that programmers trained in parallel programming are scarce.
"The computing industry must seize this opportunity and avoid stagnation, by focusing software development and training on throughput computers - not on multi-core CPUs," Dally concluded. "Let's enable the future of computing to fly--not rumble along on trains with wings."

I totally agree with this statement here. However, if this were to change, and more were trained in how to properly program for parallel computing, then the same could be said about the need to train more on how to properly program for serial/series computing - which is where we are currently in processor design. I think it's more fair to say the insufficiency lies on both sides.
On another note, am I the only one finding it amusing that the chief scientist of R&D at Nvidia is stating the CPU consumes too much energy??? Did he forget about the monster they just released, or does he still consider it to be within acceptable power requirements or efficient enough?
And really, some programs (algorithms) can never turned into a parallel app
"Our tech is the future, everyone else has no idea what they are doing. Please buy our GPGPU crap, even though it is inferior to what our competitors are making right now for everyday use."
Maybe this will open some eyes.... but i doubt many for now....
too many are in a stupor doing things they way that is the norm and easiest for them instead of how should be... how long will it take for people to wake up to the direction change needs to go?
And really, some programs (algorithms) can never turned into a parallel app
I totally agree with this statement here. However, if this were to change, and more were trained in how to properly program for parallel computing, then the same could be said about the need to train more on how to properly program for serial/series computing - which is where we are currently in processor design. I think it's more fair to say the insufficiency lies on both sides.
On another note, am I the only one finding it amusing that the chief scientist of R&D at Nvidia is stating the CPU consumes too much energy??? Did he forget about the monster they just released, or does he still consider it to be within acceptable power requirements or efficient enough?
This. Extremely few programs today even properly use the limited amount of cores we have now. Look at all the programs that are still single threaded that could easily benefit from parallelism (QuickTime and iTunes for one). There are also other algorithms that simply CAN"T be made parallel (some parts of video encoding that depend on previous results for the next task).
then there's parallelized code, which is hell for programmers
and then there's what we could call "Dally's law": your graphics card must be twice as hot every 12 months
But I do agree with what he's saying. We need to put more effort into parallel speed than serial.
His reading an essay analogy is the perfect example of that. People have to read one word at the time. Not getting a bunch of people to read a few words and be done, because that would make no sense at all.
Thats a bad example, think of it as doing video encoding. If you have one core doing all the work. Then that core has to do every frame line by line before it can move to the next frame. But if you had 4 cores, you could divide each frame into 4 parts and each core could work on their part before moving to the next frame. Obviously there would need to be a controller that kept all the cores in sync and combined each part of the frame back to a whole so that it would make sense to the end user. However even with that overhead it would still be much faster.
Human understanding and core thread execution are 2 different things and I don't think you can use that analogy when trying to differentiate parallel vs serial processing. Even if it doesn't make sense to each individual core that is only reading a small portion,it will make sense when it put back together in the end by the controller, to the end user that ultimately reads the document.
Ok, how about turning the billions of working COBOL lines of code running in mainframes of huge companies into parallel computing? You, Mr. Dally, do you accept this challenge?
Nvidia is becoming a huge biased company, spreading the lobby of parallel computing to everyone. We are still waiting for something real (and useful) to get impressed.
Serial processors still have their uses, and in applications like this, they're so much faster than one CUDA core, say (which is all I'll be able to use).