Nvidia: Moore's Law is Dead, Multi-core Not Future
Nvidia chief scientist says that everyone needs to rethink processors in order for Moore's Law to continue.
Bill Dally, the chief scientist and senior vice president of research at Nvidia, wrote an article for Forbes purporting that Moore's Law, the theory that transistor count and performance would double every 18 months, is dead.
The problem, according to Dally's paper on Forbes, is that current CPU architectures are still serial processors, while he believes that the future is in parallel processing. He gives the example of reading an essay, where a single reader can only read one word at a time – but having a group of readers assigned to a paragraph each would greatly accelerate the process.
"To continue scaling computer performance, it is essential that we build parallel machines using cores optimized for energy efficiency, not serial performance. Building a parallel computer by connecting two to 12 conventional CPUs optimized for serial performance, an approach often called multi-core, will not work," he wrote. "This approach is analogous to trying to build an airplane by putting wings on a train. Conventional serial CPUs are simply too heavy (consume too much energy per instruction) to fly on parallel programs and to continue historic scaling of performance."
"Going forward, the critical need is to build energy-efficient parallel computers, sometimes called throughput computers, in which many processing cores, each optimized for efficiency, not serial speed, work together on the solution of a problem. A fundamental advantage of parallel computers is that they efficiently turn more transistors into more performance," Dally added.
Dally also posed that focusing on parallel computing architectures will help resurrect Moore's law, "Doubling the number of processors causes many programs to go twice as fast. In contrast, doubling the number of transistors in a serial CPU results in a very modest increase in performance--at a tremendous expense in energy."
One big driver of the current processor design are the programs written to run on current chips. Dally said that the long-standing, 40-year-old serial programming practices are ones that will be hard to change, and that programmers trained in parallel programming are scarce.
"The computing industry must seize this opportunity and avoid stagnation, by focusing software development and training on throughput computers - not on multi-core CPUs," Dally concluded. "Let's enable the future of computing to fly--not rumble along on trains with wings."
- Medal of Honor Reboot Info, New Screenshots
- Deals for May 5: HP dv8t, Free .com, AC PSP-3000
- Seagate Launches ''Universal'' External HDD Line
- Internet Explorer Market Share at 10-Yr Low Point
- Apple Could Face Antitrust Probe from DOJ, FTC
- Fallout New Vegas Screens, Details Released
- Samsung Tablet in August with Super AMOLED
- External HDD Combines Storage, Speaker, USB
- ATI FirePro RG220 Shares GPU on Networks
- VIDEO: Flash Running on Tegra 2 Android Tablet
- Ellen DeGeneres Made Apple Angry With Fake Ad
- Acer: AMD's XGP External Technology No Good
- Seagate May Be Dishing Out 3 TB HDD This Year
- OCZ's New Enyo SSD Looks Sleek, Sexy
- Intel Shows Light Peak Laptop Pushing 2 HD Videos
- A Dating Site Made Just for Apple Fans (iDate?)
- Deals for May 6: HP DV7 and FREE Business Cards
- Blizzard Integrating Battle.net with Facebook








somebody has to tell the dude,DUH!!!
Translation:
"Our tech is the future, everyone else has no idea what they are doing. Please buy our GPGPU crap, even though it is inferior to what our competitors are making right now for everyday use."
Thats all well and good, until you need to do one thing BEFORE another, like when rendering a scene. Or maybe he forgot that.
!!
Maybe this will open some eyes.... but i doubt many for now....
too many are in a stupor doing things they way that is the norm and easiest for them instead of how should be... how long will it take for people to wake up to the direction change needs to go?
What are you waiting for then, Bill Dally, go ahead and create that chip... ...ha, that's what I thought, even you can't do it.
except without serial optimizations general apps (not compute apps) will suffer since the serial optimizations allow for fast comparisons where as their compute cores on the GPU are very inefficient at this. Yes it will help computing, but general apps will suffer.
And really, some programs (algorithms) can never turned into a parallel app
I totally agree with this statement here. However, if this were to change, and more were trained in how to properly program for parallel computing, then the same could be said about the need to train more on how to properly program for serial/series computing - which is where we are currently in processor design. I think it's more fair to say the insufficiency lies on both sides.
On another note, am I the only one finding it amusing that the chief scientist of R&D at Nvidia is stating the CPU consumes too much energy??? Did he forget about the monster they just released, or does he still consider it to be within acceptable power requirements or efficient enough?
This. Extremely few programs today even properly use the limited amount of cores we have now. Look at all the programs that are still single threaded that could easily benefit from parallelism (QuickTime and iTunes for one). There are also other algorithms that simply CAN"T be made parallel (some parts of video encoding that depend on previous results for the next task).
What he says is true. It's the programmers' fault for not using more parallel programming. But unfortunately, there's only so many things that you can parallel.
His reading an essay analogy is the perfect example of that. People have to read one word at the time. Not getting a bunch of people to read a few words and be done, because that would make no sense at all.
while I agree that there are instances where parallel processing works way better than serialized, you can't altogether switch from one to the other
then there's parallelized code, which is hell for programmers
and then there's what we could call "Dally's law": your graphics card must be twice as hot every 12 months
Wait, so then you'd have a bunch of people who only understood one paragraph and nothing else? It's all gotta go back to serial at some point! This is a bad example.
But I do agree with what he's saying. We need to put more effort into parallel speed than serial.
The Moor's law is only about # of transistors. It is irrelevant how many processors are build with this transistors. It does not say that # of transistors is proportional to performance or MHz speed or anything like that. I find the reference to Moor's law in NVIDIA paper just a marketing trick to promote their architecture and SCUDA. What they discussing has nothing to do with Moor's law, quite the contrary, it is how to get better performance from the same amount of elements.
Why not take both approaches?
The problem now is the x86 and the userbase that depends on it. What we need is a new mainstream architecture, that emulates x86 and which does parallel stuff really fast.
This touches on something I've been saying for a while, a MASSIVE change is required, but it needs hardware and software developers to work together and change together, one cannot change without the other!
Wait, so then you'd have a bunch of people who only understood one paragraph and nothing else? It's all gotta go back to serial at some point! This is a bad example.But I do agree with what he's saying. We need to put more effort into parallel speed than serial.
Thats a bad example, think of it as doing video encoding. If you have one core doing all the work. Then that core has to do every frame line by line before it can move to the next frame. But if you had 4 cores, you could divide each frame into 4 parts and each core could work on their part before moving to the next frame. Obviously there would need to be a controller that kept all the cores in sync and combined each part of the frame back to a whole so that it would make sense to the end user. However even with that overhead it would still be much faster.
What he says is true. It's the programmers' fault for not using more parallel programming. But unfortunately, there's only so many things that you can parallel.His reading an essay analogy is the perfect example of that. People have to read one word at the time. Not getting a bunch of people to read a few words and be done, because that would make no sense at all.
Human understanding and core thread execution are 2 different things and I don't think you can use that analogy when trying to differentiate parallel vs serial processing. Even if it doesn't make sense to each individual core that is only reading a small portion,it will make sense when it put back together in the end by the controller, to the end user that ultimately reads the document.
This change isn't something that consumers will clamor for, or something that software companies will push. A revolution in technology, whatever it may be, will be needed to bring this type of computing to the mainstream. Whether it be a parallel programing language, or some type of chip, something will have to be released so that everyone will look at parallel processing and say "That's the future, right there".
Ok, how about turning the billions of working COBOL lines of code running in mainframes of huge companies into parallel computing? You, Mr. Dally, do you accept this challenge?
Nvidia is becoming a huge biased company, spreading the lobby of parallel computing to everyone. We are still waiting for something real (and useful) to get impressed.
I've considered parallelising numerical integration, and I for one think it is impossible. You NEED the results of the previous step in order to process the next step. Parallel execution at different time steps is impossible.
Serial processors still have their uses, and in applications like this, they're so much faster than one CUDA core, say (which is all I'll be able to use).
its not that a lot of programmers are not rained in parallel apps, its just not easy to do. The easiest apps to make parallel are mathematical algorithms, though any data shared has to have a lock so multiple threads can't access it the same time, and this really hurts performance
Meh. Like always, this is going to be one of several paths the industry can go down at once, with the same overall goal of faster performance.
Just like CPUs didn't JUST go from single-core to dual-core to quad-core, or JUST go from 90nm to 65nm to 45nm, or JUST go from one socket and chipset to another ... or RAM didn't JUST improve in clock speed, or JUST improve in capacity, or JUST improve in latency ... what Dally's talking about is not the only way to improve efficiency. Like always, you've got several improvements going on at the same time, and if this guy's theory comes to pass, it'll be one piece of the puzzle among a sea of others.
It's not going to require some massive upheaval in the software and hardware industries at once. Technologies do not have to be introduced in lock-step with one another. How many programs today even make the most efficient use of existing CPUs and GPUs? Yet they're already coming out with hardware that has even more cores or more bandwidth. As always, there's going to be steady improvement and no "clean break."
I think the biggest problem if this happens is the average consumer. If they focus on parallel rather than more cores, the average consumer is still gonna think the more cores the better
What's wrong with your headline? It is not the article at all.
The Moor's law is only about # of transistors. It is irrelevant how many processors are build with this transistors. It does not say that # of transistors is proportional to performance or MHz speed or anything like that. I find the reference to Moor's law in NVIDIA paper just a marketing trick to promote their architecture and SCUDA. What they discussing has nothing to do with Moor's law, quite the contrary, it is how to get better performance from the same amount of elements.
Exactly what I was going to say, so bumping it here. Moore did not attempt to predict or limit specific architectures. Nvidia either doesn't understand Moore's law or is deliberately misusing it.
jenesuispasbavard, why can't you just split your range of integration into segments and let one core integrate each segment ? Why exactly do you think this task cannot be parallelized ? Am I missing something ?
Teaching more programmers parallel programming practices is all well and good but it'll only get us that far. Much like there are problems or algorithms suitable for parallel execution there are also those that are not, including the majority of mainstream user software.
In the end I believe preparing software for parallel, or serial, execution must be transparent to the programmer for such a paradigm shift to be possible.
Whether this is done through the compiler, operating system or a run-time compiler is largely irrelevant. The point is that expecting every programmer to adopt vastly different coding methodologies depending on which kind of algorithm a particular subroutine uses is neither practical nor realistic.
Perhaps the trend of treating various sub-processors, such as the GPU, as general processing units available to the system will see an upswing in popularity for interpreted languages due to all this.
*shrug* Interesting times, no doubt.
Oh yes, regarding the article... it's just NVIDIA being full of themselves again. Even if there is a point of two to be made along those lines.
o_O Reading some of these posts... man...
Out of order execution is the norm... really, we have already solved the problem of having to sequence data X and Y before we can sequence data Z when you do tasks X Y Z in parallel. Get over your puny conscious perceptions.
If you want to take it up a notch, pick up a quantum physics book. Then you might sound smart when talking about parallel architecture execution. You really can sequence Z before getting X and Y results. It's not silly math, it's how the universe works.
I think, as mentioned, a lot is due to 'habit'; I learned to program in FORTRAN IV in high school in '69 - I'm pretty well 'canalized' to serial execution. My 'habits' and procedural technique were set in stone by Modula2, which has taken me through C, C++, C# (which really appears to 'want' a JAVA background), and now, .net - I realize the paradigm shift, and am planning a rebuild to include an ATI stream processor, so I can 'take the plunge' to OpenCL - but I realize it'll be deep water for an 'old dog' like me! I'm pretty sure kids who learn CUDA or CL in high shool now will see way more opportunities for 'parallelism', but, as pointed out by several - there are large numbers of things you just can't 'split up'! But, damn - it sure seems to me that OS themselves could vastly benefit from a healthy dose of mulitple-coreitis!!!
BTW - considered using a couple 470's and a Fermi - BUT - figured out that I'd have to pull a 220 line up from the basement to accomodate the PSU, and run a second cooling loop back down to the basement, or just use my bedroom/office as a turkey roaster most days...
Changing to parallel computing is going to be the equivelant of changing the auto industry from using gasoline to hydrogen. It wont be a fast change nor and software companies are ill equiped to deal with this change as of right now.
"...while he believes that the future is in parallel processing...."

yea... in the furture... for now it is a tough thing to see. Look at how many years did theose companies take to get used of Cell processor on PS3 (well.. many of them are still either learnig ro just avoid it)..
Yes, parallel processing is very useful, but we need a/a few pioneers to sacrify themselves for the good... Is Nvidia going to be "the one"?..
I'd rather hold my breath for some sort of breakthrough in thermal dynamics allowing us to achieve exponentially higher clock speeds. I'm no physicist, and i certainly don't have the answer, but I want a 10ghz air cooled cpu Damnit!