Scaling The Brick Wall
This AMD core team found itself with two fundamental problems, one technical and the other philosophical, and both had to be solved before anything could move forward.
"On a pure technology and transistor side, we had a conundrum on our hands," says AMD’s Macri. "What makes CPUs go really fast ends up burning a lot of power on a GPU. What makes GPUs go really fast will actually slow a CPU down incredibly. So the first thing we ran into was just getting them to live on the die together. We had the high-speed transistor combined with the very low-resistance metal stack that’s optimal for CPUs versus the GPU’s more moderate-speed transistor optimized around very dense metalization. If you look at the GPU’s metal stack, it looks like the letter T. It looks like the letter Z in a CPU. One’s low-resistance, one’s lower density, and so higher resistance. We knew we had to get these guys to live on the same die where they both perform very well, because no one’s going to give us any accolades if the CPU drops off or the GPU power goes up or performance falls. We needed to do both well. We very quickly discovered that wall."
Imagine the pressure on that team. With billions of dollars and the company’s future at stake, the group eventually realized that a hybrid solution couldn’t exist on the current 45 nm process. Ultimately, 45 nm was too optimized for CPU. Understanding that, the question then became how to tune 32 nm silicon-on-insulator (SOI) so that it would effectively play both sides of the fence. Of course, 32 nm didn’t exist outside of the lab yet, and much of what finally defined the 32 nm node for AMD grew from the Fusion pursuit.
Unfortunately, until the 32 nm challenge was solved, Fusion was at a standstill—and it took a year of work to reach that solution. Only then could design work begin.
Meanwhile, the Fusion team was also fighting a philosophical battle. With the transistor and process struggle, it was massive, but at least the team knew where it needed to go and what the finish line looked like. Even with the transistor challenge figured out, the question still remained of how to best architect an APU.
"One view was like, the GPU should be mostly used for visualization. We should really keep the compute on the CPU side," says Macri. "A second view said, no, we’ve gotta split the views across the two halves. We’ve got this beautiful compute engine on the GPU side; we need to take advantage of it. There were two camps. One said things should be more tightly coupled between the CPU and GPU. Another camp said things should be more loosely coupled. So we had to have this philosophical debate of deciding what we should treat as a compute engine. Through a lot of modeling, we proved that there was an enormous advantage to a vector engine when you have inherent parallelism in your code."
This might have seemed obvious from ATI’s prior work with Stream, but the question was how much work to throw at the GPU. Despite being highly parallel, GPUs remain optimized for visualization. They can process traditional parallel compute tasks, but this introduces more overhead. With more overhead comes more impact on visualization. With infinite available transistors on the die, one could just keep throwing resources at the problem. But, of course, there are only a few hundred million transistors to go around.
"Think of all the applications of the world as a bathtub," says Macri. "If you look at the left edge of the bathtub, we call those applications the least parallel, the ones with the least amount of inherent parallelism. A good example of that would be pointer chasing, right? You need a reference. You need to go grab that memory to figure out the next memory you gotta go grab. No parallelism there at all. The only way to parallelize is to start guessing–prediction. Then, if you go to the right edge of the bathtub, matrix multiply is a great example of a super-parallel piece of code. Everything is disambiguated very nicely, read and write stream is all separate, it’s just beautiful. You can parallelize that out the wazoo. For those applications, it’s very low overhead to go and map that into a GPU. To do the left side well, though, means building a low-latency memory system, and that would load all kinds of problems into a GPU that really wants a high-bandwidth, throughput-optimized memory system. So we said, 'How do we shrink the edges of the bathtub?' Because, the closer we could bring those edges, the more programs we could address in a very efficient way."
A big part of the philosophical debate boiled down to how much to shrink those bathtub edges while preserving all of AMD’s existing visualization performance. Naturally, though, while all of this debate was happening, AMD was getting hammered in the market.
With Haswell coming next year, Intel might just beat AMD at HSA. They need to deliver a competitive product.
I think you were being overly kind about the current CEO's ability to guide the company forward.
Dirk Meyer's vision is what he is currently leveraging anyway.
A company like that needs executive leadership from someone with engineering vision ... not a beancounter from retail sales of grey boxes.
History will agree with me in the end ... life in the fast lane on the cutting edge isn't the place for accountants and generic managers to lead ... its for a special breed of engineers.
They don't have the efficiency of Ivy Bridge, or Medfield, they don't have the power of Ivy Bridge, and they're missing out on this round of the Discrete Graphics battle (they were ahead by so far, but nvidia seems to have pulled an Ace out of their butt with the 600 series). So what exactly IS AMD doing well? HTPC CPUs? Come on! The adoption rate for the system they're proposing with HSA is between 5 and 10 years off....and because they moved too early, and won't be able to compete until then, they have to give the technology away for free to attract developers.
Financially, this a company's (and a CEOs) worst nightmare...they're too far ahead of their time, and the hardware just isn't there yet.
This will end of being just like the tablet in the late '90s, and early '00s. It won't catch on for another decade, and another company will spark, and take advantage of the transition properly, much to AMD's chagrin.
I'm not sure if it was the acquisition of ATI that made AMD feel like it was forced to do this so early, but they aren't going to force the market to do anything. This work should have been done in parallel while making leaps and bounds within the framework of the current model.
You can't lead from behind.
I've always been a fan of AMD. They've brought me so e of the nicest machines I've ever owned...the one that had me, and still have me most excited. But I have, and always will buy what's fastest, or best at the job I need the rig for. And right now...and for the foreseeable future, AMD can't compete on any platform, on any field, any where, at anytime.
AMD just bet it's entire company, the future of ATI (or what was the lovely discrete line at AMD), the future of their x86 platform, and their manufacturing business all on something that it wasn't sure it would even be around to see. They bet the farm on a dream.
Nonetheless, i disagree that you were being overly kind about the CEOs ability to lead the company. I think you're being overly kind for thinking this company has a viable business model at all. Theyll essentially have to become a KIRF (sell products that are essentially a piece o' crud, dirt cheap) f a compay to stay alive.
This is mostly me ragin at the fail. The writer of this article deserves whatever you journalist have for your own version of a Nobel.
This was a seriously thorough analysis, and by far the best tech piece i've seen all year. We need more long-form journalism in the world, for i her way too many people shouting one line blurbs, with zero understanding of the big picture.But i have to say, that while this artucle is 98% complete, you missed speaking anout the fact that this company is a company...an enterprise that survives only with revenue.
Now, does anyone want to play Crysis in software rendering with max eyecandy?
It's not over until the fat lady sings. As I read your post, I felt that you were missing a (or the) big point of the APU and this article.
It's about how software is developed nowadays and how there is such a huge reserve of potential performance waiting to be tapped into. I could imagine that if future software bite into this "evolution" to more GPGPU programming then I would expect a huge jump in performance even on the current, or shall I see currently being phased out, Llano APU's.
Yes, current discrete GPU systems would improve in performance as well significantly I would think, but to the same degree that APU's would improve, especially with the new technologies to be implemented like unifying memory spaces, etc? I don't think so.
I'm not saying that you're totally wrong. AMD might end up croaking, but we can't say for certain 'til it happens. Don't you agree? :-) (I'm not picking any fights BTW. Just sharing my thoughts.)
Funny, but last I checked, AMD's Radeon 7970 GHz edition is the fastest single GPU graphics card for gaming right now, not the GTX 680 anymore. Furthermore, AMD can compete in many markets in both GPU and CPU performance and price. AMD's FX series has great highly threaded integer performance for its price (much more than Intel) and the high end models can have one core per module disabled to make them very competitive with the i5s and i7s in gaming performance. Going into the low end ,the FX-4100 and Llano/Trinity are excellent competitors for Intel. Some of AMD's APUs can be much faster in both CPU and GPU performance than some similarly priced Intel computers, especially in ultrabooks and notebooks where Intel uses mere dual-core CPUs that either lack Hyper-Threading or have such a low frequency that Hyper-Threading isn't nearly enough to catch AMD's APUs. Is this always the case? No, not at all. However, you ignore this when it happens (which isn't rare) and you ignore many other achievements of AMD.
As of right now, there is no retail Nvidia card that has better performance for the money (at least when overclocking is concerned) than some comparably performing AMD cards anymore. The GTX 670 ca't beat the Radeon 7950 in overclocking performance and it can't beat the 7950 in price either. The GTX 680 is no more advantageous against the Radeon 7970 and 7970 GHz Edition. I'm not saying that these cards don't compete well or that they don't have great performance for the money (that would be lying), but they don't win outside of power consumption, which, although important, isn't significant enough of an advantage when the numbers are this close.
Whether or not AMD will fail as a company remains to be seen. Maybe they will, maybe they won't. However, if you want to say that they do, then the supporting info that you give should be more accurate.