Larrabee Scraped :(

Don't look down on the graphics competition, it is still there, but think of what the Fusion future might do to bring competition to the CPU market. :sol:

Call it a silverlight ligning. ;)

Anywhoo, not really surprised, when they hit delays that took them from a late 2008 launch to a late 2009 launch I started to doubt their potential unless they dramatically improved on the original design, and then when it got pushed to first early 2010 and then late 2010, then it was pretty much going to be a science project / proof of concept effort and not a competitive gaming solution (even for midrange would be expensive) which is somewhat what Fermi is looking like, but only with far better success with nV sofar, but still no finish line yet so success still unknown.

The funny thing is that this will be seen with much glee at nV since this means their primary competitor in the HPC market has disappeared and they can return to the multi-core versus multi-die argument where they are using well against AMD and intel's CPUs, and when Fermi comes they can also use it against AMD/ATI's multi-core GPU solutions also.

Disappointing, but not surprising really. :pfff:
 
Yep, I just wonder what this will do towards AMD's strategy for a reply to both intel and nV now that they only have to worry about nV? intel was a far FAR greater threat to AMD's plans (especially since intel is the only one that could create 'creative' bundles for CPU and GPU), so will AMD reduce its efforts until nV shows its hand?

If I were AMD I'd be looking at a fusion core that combines a dual core (prefer single core HT but that's not possible) CPU with some SPUs and a GPU RBE on the other side of the instruction crossbar (potentially giving them both access to massive L2 cache with low instruction latency and easier integration for workloads).
In 32/28nm you should be able to fit a nice TurionX2 Ultra or NeoX2 and HD56xx/5500 class graphics together on a die smaller than a HD5870 and maybe even HD5770, which would give you a solid platform for HTPC as well as the average user's gaming needs.

And if they went with essentially and 'HT sideport' you would then be able to scale well too, but of course you need to figure out all the underlying communication and making everything work together and not simply off-load the work in chunks creating alot of dependencies and delays. The opportunity is there, but the implementation seems to be what everyone is having difficulty with. I don't know enough about the latest Turion designs to fully grasp what would be required, but now that I've mused about it out loud, it would be interesting to see how well the designs would fit and what the major barriers would be.

Seriously, just putting the power of an HD4650 and old Turion X2 (not even Ultra) on a single die would be a damn attractive to many people, especially people like Apple and Dell and HP on the OEM desktop side, an then combining the processing power with X86/X64 instruction capability would make it interesting for the workstation and HPC market too if you can solve scaling.

Anways, hopefully this is seen as an opportunity to pounce, and not just one to sit back and take a breather, because competitors who get comfortable is just as bad as a monopolist IMO. For the market the potential of intel entering the market was as powerful if not more so, than actually having a mediocre product in the market (although of course having a mediocre product and threatening a better one would be the most powerful incentive to all).

 

rawsteel

Distinguished
Oct 5, 2006
538
0
18,990
Looking at Fusion architecture I think AMD are on the right path. Their implementation of a module (i.e. 2 integer cores + 1 FP core) seems logical as 85%+ of the instructions are usually integer.

The turning point by my thoughts will be the OpenCL advancement in the next 2 years and how well integrated it will be when fusion comes. And I am affraid that AMD will AGAIN be ahead of its time.

Potential for a great jump in performance (like 2x-5x) with one generation is there - but it again depends on software and its again chicken and egg situation. The thing is that adoption of OpenCL will be in battle with CUDA but I still think developers will lean toward OpenCL as it will work on EVERY card while CUDA is only nVidia. But sometime things dont go in the logical direction.

It will be interesting. Even imagining a super entrance level CPU/GPU fusion processor like 1 module (2 int cores + 1 FP + 1 GPU) will give a GREAT platform for 100% of the situation the pc will have to handle - which is HTPC + general applications + enhanced GPU applications + Games

I still dont know how exactly all these parts will work together and who/what will assign each task to appropriate part - would it be drivers, would it be software implementation but I see intel pointing in the same direction so I hope we see faster some standartisation in this area and developers having clean path ahead of what to do.

Time will tell
 
Using a serial architecture for a GPU was not a good idea from the start. Might be better then Intels current solutions, but thats not exactly saying much...

Intel tried to make a GPU from a CPU perspective: Large serial cores with a high transfer rate. Hopefully AMD takes a hint when they launch their platform...
 

jennyh

Splendid
It's a very interesting situation. What we basically have right now is the weakest financially company in almost complete control of graphics.

Yes nvidia and intel own xyz market share yadda yadda...it's *not* going to last. Nvidia are already breathing on the last of the fumes from their great era...so much so now that Nvidia are now seen mostly as a complete joke of a company.

There is a lot of money at the ignorance end, so both Nvidia and Intel will do well for a while on substandard goods, but the longer they are living on rep alone, the harder they will fail down the line.

The very people who made intel and nvidia the 'big thing' are the same people who are abandoning them right now due to being failures. The graphic market especially is a fickle bunch - willing to swap sides for more fps in a heartbeat.
 

JeanLuc

Distinguished
Oct 21, 2002
979
0
18,990


Although there are no guarantees it looks like Intel might continue the R&D of Larrabee through to 32nm were they will review the feasibility of launching a consumer video card. On 45nm Intel doesn't stand a chance of releasing anything remotely competitive, given that last year Intel claimed an overclocked Larrabee would give you 1 teraflop peak performance. Now that was last year and AMD already had the Radeon HD4870 which could deliver that kind of performance and now in 2009 AMD has the HD5870 which gives out 2.17 teraflop peak performance so Intel would have to sell Larrabee dirt cheap just to compete.
 
i get the feeling 45nm etc was insufficent for such a project (high tdp, power on a card etc) and needs alot more work then anticipated hence they dumped the larabee release but advised they are still using the RND etc
 
Part of the claims are the same claims done by ATI and nVidia, sure its rated at such n such peak TFlops etc, but actual are different, same for x86, dont let it fool you. Its really no different, except possibly for gpgpu usage, and maybe only then.
If their shrinks bring excellent perf increases, we will see a discrete LRB, if not, itll be a SoC variant and a gpgpu board
Understand, LRB is designed for complete total thruput, but it doesnt mean that thruput wil ever get used. Much like furmark or occt or whatever specialized gpu app out there, it comes close but even they cant do 100%.
Does it mean itll be more efficient? Possibly, but again, since theres no FF HW, its all SW driven, and the latencies and imperfect usage happens as well.
Ive written a ton about LRB and what it faces in gfx for competition, and that had to be its springboard, and facing a fast moving gfx market, with some major changes coming soon as well, its just bad timing for LRB, unless it was going to be a killer gfx solution.
The fusion projects, HKMG, Fermi, all these things make LRB a tough sell at the moment as just a player, and not groundbreaking or top model.
From what Im reading, its the same bugaboo Intels always faced with gfx, drivers. Having to do the whole of gamedone if you will is a daunting task, but to do it all on brand new drivers, on brand new HW, with a brand new approach, with brand new driver teams etc. this sint like their last try, when gaming wasnt as wide spread as it is today.
So, it needed to fit a tight window, and failed.
Now, it needs to finish up, try to draw some attention for its strengths, and clear up some holes it had, and hope the perf jumps on the next iteration, meanwhile, itll face newer cards with greater perf, and on a new process, with the usage of HKMG. more mountains to climb, and if fusion takes off, and somewhat curtails the low/lower mid end of discrete cards, leaves less room for LRB as well.
Im looking for its usage as either a IGP/on chip solution like fusion, Intel style, and maybe for HPC, but even there, the numbers dont justify it, even at 1000-3000 a clip, unless they get some design wins, which is what theyre trying to do now
 

duzcizgi

Distinguished
Apr 12, 2006
243
0
18,680

x86 ipc is one of the worst, when it comes to executing things in massively parallel manner. So, from the very beginning, it wasn't a wise choice to exploit x86 code base for Larrabee.
Even today's x86 based CPUs spend most of their time dispatching & decoding the instructions. (If I recall correctly, in fact, about 50% of the die area excluding cache is dispatch & decode logic.)

Looks like Hector Ruiz was the right one, back in 2006, when they bought ATi.
Now they have an end-to-end product that can scale perfectly, while the competition is struggling.

When OpenCL & DirectCompute gains traction (not if, but when here. Software developer also love free lunch!) They'll have their parallel floating processing unit ready sitting side by side with the CPU die connected with hypertransport, instead of PCIexpress, reducing latency.

Just imagine these: Shared L3 cache with GPU/CPU & interchip communications over hyperthreading crossbar switch. (like the other cores inside)
On the NB side there will be only a RAMDAC and that's all. They can trim the FPU on the CPU core even more, leaving just enough FPU power to enable running legacy apps, leaving them more silicon real estate to either lower the costs or add more cache, faster integer, additional cores etc.
 

KidHorn

Distinguished
Oct 8, 2009
269
0
18,790
Maybe Intel is looking at merging with NVidia. First they pay a huge sum to AMD to drop all litigation claims. Then they announce this news. With current stock prices being inflated in tech, Intel can swap their shares for current NVidia shares at a premium. I don't think there would be any antitrust problems since they have a formidable competitor in AMD.
 
I see this happening only if nVidia allows it, and LRB is a total failure for gfx, as its gpgpu usage should be good.
If I recall, Jensen already turned down such a merger due to control (CEO status) issues
 

KidHorn

Distinguished
Oct 8, 2009
269
0
18,790


Things are different now. If I were a major shareholder at either company, I would seriously consider this. The CEO doesn't own the company (Unless he owns a boatload of voting shares).
 
That all depends on the importance the board places on the CEO.
Unlike Intel, Jensen has shown to be pivotal as to the success of nVidia, and he does own alot of shares as well.
Either way, I like Anands take on this, tho I also think Intel is underselling the importance of gfx in common sales
http://anandtech.com/cpuchipsets/showdoc.aspx?i=3686&cp=2#comments
Intel has a different vision of the road to the CPU/GPU union. AMD's Fusion strategy combines CPU and GPU compute starting in 2011. Intel will have a single die with a CPU and GPU on it, but the GPU isn't expected to be used for much compute at that point. Intel's roadmap has the CPU and AVX units being used for the majority of vectorized floating point throughout 2011 and beyond.




Intel's vision for the future of x86 CPUs announced in 2005, surprisingly accurate

It's not until you get in the 2013 - 2015 range that Larrabee even comes into play. The Larrabee that makes it into those designs will look nothing like the retail chip that just got canceled.

Intel's announcement wasn't too surprising or devastating, it just makes things a bit less interesting. "

If Intel delays a competitive product going forwards until such a merger can take place, it could be either an act of brilliance on their part, as gfx wont play as large a part overall in sales, or, it leaves the door open for AMD with their fusion derivatives to gain marketshare.
All this does lend to an alternative for HPC use, instead of cpu usage only, where nVidia and some AMD situations advance this market, where Intel at some point can come in strong down the road as well.
If their roadmap is a longer term approach, looking at LRB much like the 80 cored Polaris has people testing, this keeps them in thegame, and better prepares for this new market.

For gfx tho, thats a whole different scenario, and the crystal balls as to how all this plays out in time is blurry
 

daedalus685

Distinguished
Nov 11, 2008
1,558
1
19,810
Intl has access to ATI patents, why would they take over nvidia? Besides that, the government would never allow it. It woudl be the last straw that saw intel exploded into a dozen different companies.
 

JofaMang

Distinguished
Jun 14, 2009
1,939
0
19,960
Combine this with the death of the next version of the cell processor, and we see a company cutting it's R&D budget substantially. Perhaps it just isn't time yet for this type of progression in PC tech?
 
Again, I'm not shocked. X86 isn't a good arch to begin with, Intel CPU's are designed for executing serial tasks, and their solution to get a parallel based arch was basically to wire lots of them together.
 

duzcizgi

Distinguished
Apr 12, 2006
243
0
18,680
Agree. x86 isn't the best architecture to get parallelism, to begin with.
It's too CISC. (Complex Instruction Set Computing)

Hell, since I last worked with MASM, I lost track of the extra operands they've added!

If you need parallel processing, your system should be as simple as possible. Number cruncher and that's all.

If you need branching, it's also another system.

You can merge two types of systems, but the result will be totally different from both(a.k.a. Fermi or Cell)

Bottom line, LRB tried to use the good old tried and verified horses to run a Ferrari chassis. Well, it can go as fast as the horse can get.

As a CPU, indeed, x86 isn't the optimal architecture, to begin with. Just the most widespread architecture, that's all.
 

randomizer

Champion
Moderator

It's a wise choice in that almost all software is built for the x86 architecture. I can't run an x86 OS on a HD5870, and I can't run any of my productivity applications on a HD5870 (yet). So for me Larrabee is an excellent idea and exactly what I want, until something changes to make me think otherwise (ie. a GPU capable of executing x86 code or my applications moving to Stream/CUDA - and I know some are on the verge of implementing CUDA acceleration).
 

duzcizgi

Distinguished
Apr 12, 2006
243
0
18,680


Depends on what you want to do Randomizer.
If you want to execute an OS, you'll need a CPU with branching and hardware process protection facilities, which are heavy overheads.
If you want to execute massively parallel floating point operations "under the control of an OS running on a CPU" then, this overhead is much much more.
Who wants to execute Windows OS on HD5870 or Fermi? What use does it have?

ON HPC, it's useless to have lots of CPU/GPU cGPUs which eat up resources that virtually don't help in real execution.

Consider this: Why a computer with a quad core intel CPU coupled with 4 TESLA cards do FP number crunching better than a cluster system, and do it much cheaper?

With x86 ISP, it's very hard to program parallel execution. All the SIMD instructions are afterthought. No consistency between versions. You should swap registers too often, which requires you to go back to random places in memory often, requiring a BIG cache to improve performance.

The best option is what we have: Have the CPU prepare data & operations for the GPGPU via OpenCL/DirectCompute/CUDA/Stream and let it crunch that data.