Can AMD salvage QFX with an in-house chipset?

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
Well, the QFX has been released and while certain scenarios show incredible promise, certain areas are also saddled with too much baggage.

The fact that the Opteron dual can be outfitted with SLI in a wksta and offer good perf without these power levels implies that the total package could have been done better.

The Opteron 285 runs at 2.6GHz and this graph shows that without the additional SLI power AMD runs at 322W and the dual 5160 runs at 267W (full load).

Looking at the varous articles around teh web, it seems as though only Anand managed to actually find suitable tasks that were reasonable for multi-tasking. In his case he used BluRay movies which totally killed all the dual core systems.

His power numbers were also at least 100W lower than other test sites. He turned on CnQ and got the idle temps down to within 4W of the C2Q system. Of course this didn't dent the 456W the system drew at "full load" with an 8800GTX ( which I believe draws 225W+ ).

And these are FX74 numbers. FX70 is shown to use even less so I believe OEMs can get reasonable wksta power levels out of it in teh next few months, especially if AMD releases a new rev( they sorely need to drop power by at least 10%- perhaps more and the lessons learned can help get Agena down below the reported 125W)

Hexus is also reporting that they can show a defect in the NUMA implementation of the Asus board BIOS ( this post was going to be called "Did Asus and nVidia drop the QFX ball") that maybe why games are suffering so much from latency problems.

One review ( most are posted at AMDZone) stated that AMD is reporting that the Interleave mode will need to be turned off for Vista and on for XP.

I believe AMD reported that they would release their own branded chipset and hopefully it will be less power hungry than the 680a, which is reported to use more power than even 975X. Having two of them surely doesn't help. nVidia does have a two socket SLI hipset in the 3600 and ASUS' implementation is only $300. Even the $400 Asus 680i for Intel implements less PCIe for less power reserves.

Because 7950GT and the probably forthcoming 8950GT only require two slots for Quad SLI. I can see the need for 4 low end GPUs for certain content creators but even 3 PCIe slots can't really be used right now as no "Havok" type apps or cards have been released, except for the server (AMD Stream).

Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

But the real judgement is that only expert builders will make QFX something not too loud or hot, while Vista X64 may do wonders for it in multithreaded apps so it is not yet ready for prime time.

Let's go AMD! Show your true potential.
 
Hexus is also reporting that they can show a defect in the NUMA implementation of the Asus board BIOS ( this post was going to be called "Did Asus and nVidia drop the QFX ball") that maybe why games are suffering so much from latency problems.
I was under the impression that XP wasn't NUMA aware... would that even be a problem under XP?
 
Being an AMD stockholder, the QFX, or 4x4 as many know it, has left me less then thrilled. Could I even say less then disappointed? I was thinking of building a new computer around one of these things, but as it stands, forget it.

A new motherboard might help, but looking at how crowded the Nvidia motherboard is, I wonder if an all new motherboard is needed, one that is physically larger so there is more room for the cpu's, more room for bigger and better coolers, and more room for PCI/PCIe slots. Things look so crowded at present that I wonder if it would really be to fit a pair of 8950GT cards and still have room for a sound and a PCI card.

There seem to be too many problem to overcome. Sure, maybe a lot of development could help, but why not put that development into the AM2 board and its cpu's, or even the AM3? It looks to me that the present route is nothing but dollars wasted and opportunities lost. And I can gaurentee that at the next stockholders meeting, a lot of questions are going to be getting asked. Wonder how many heads will roll because of this QFX failure?
 
Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

Sounds like wishful thinking to me.

Let's go AMD! Show your true potential.

Baron, 4x4 IS showing it's true potential. It had the potential to be expensive, hot, and use lots of power and so far as I can tell it's gone for the hat-trick and scored on every count.
 
Hey, now you simply gotta believe.... believe in miracles.... believe that that potential is there it just hasn't been unleashed.... believe that if you sprinkle pixie dust over the rig it will cool down by 1/2 and improve 20 % in performance, then it would be competitive.

If you build it they will come....

:) :)

Yes, I believe! I believe I'll get a beer.

Where, oh where, is AM3, K8L, and the other promised stuff. Or will those things be broken dreams as well?
 
Yeah, we gotta keep smiles on our faces and things in perspective. The 65nm may help things out a lot. I hope so. Its that so much was expected and so little delivered. I think it would have been better if the QFX had been withdrawn before the embarassment occured. But that's an opinion.

Perhaps everybody, both AMD and enthusiasts alike, learn something from this it will be good. In the meantime, I'll keep trying to get some more out of my present cpu. Seems like I can get 2990 mhz and be stable, but the moment I cross 3000, something fails. Not sure yet if its the ram or the video card or what. Just know that 3DMark06 starts and then crashes out after a few seconds. Gotta keep looking.
 
doesnt anyone think that 4x4 could be a really huge ace in the hole for AMD? I mean, now AMD have brought out this half server, half enthusiast nightmare, people have been exposed to the prospect of buying two processors and putting them on a board together. This idea, brought into the mainstream could mean that in the future, AMD could have the jump on Intel forever. If AMD can catch up to the increasing core count that intel are performing at the moment, and manage to bring out quad cores soon, then we could stick two quad core chips into one of these boards and have octo core computers with minimum hassel.

This line of speculation is, of course, considering that AMD can bring out significantly cooler quad cores, to counteract the fact that there would be two of them sitting side by side and also make it attractive in total system cost with cheaper CPU's and not significantly more expensive mobo's.
 
1. 4x4 is still FUTURE technology (at least until the release of the 65nm quads to populate it).
2. The 'NEW' FX chips are AMDs greatest failure ever.
This is the toughest moment of the actual depression for them and they've got to show some quad performance soon, ad it's got to be well convincing for the sake of their future.
 
A chipset is not going to make this turkey fly.....

Baron, it is a failure of the architecture.... the cHT and NUMA arrangement was never meant for the desktop.

Perhaps when quad core comes out and if they can cut the latency by 80% between the two CPUs, then you will see something. Until then, anytime you strap in a second CPU it will hurt performance.


Sure, server apps have a slightly different requirement but renderign uses all of the subsystems also and the QFX shines. SInce most games will fit into 2GB (at least right now), a better system for keeping track of the CPU the memory is loaded to will do a lot.

Hexus is also reporting that Vista has been shown to have a much better shceduler. I saw the RC2 tests and can say that they were 32bit. 64bit NUMA should do much better, btu again if there is a problem in the implementation no amount of optimization will help.

Again, hopefully AMD will get the power down and fix the problem with latency. Drooping the latency back to near-FX62 levels (possible, I don't know I can't find many latency tests for Opteron) then a lot of the speed will be realized.

Hexus also stated that they had a "pre-release" system and that the BIOS fix was forthcoming. No one else mentioned this poblem so I don't know if they even looked at it as a possible cause.

It does multitask like crazy though as tests show that as background processes increase, QFX overtakes C2Q.

We'll have to wait for Vista X64 either way. I'm trying to get the Business version now and as soon as I do, I will be upgrading to it in prep for my next HW upgrade.

Looking at Anand's tests with the new Valve Multithreaded engine, QFX show nearly 100% scaling and at some speeds a little more. That is theoretical but it does show that when coded properly, it does narrow the diffeence between X6800 and FX62.

WIth a highly optimized NUMA implementation along with different grades of "clock" allowing for 1066 (turn down the multiplier and turn up the HT speed) I can see X64 multithreading being much better in the areas where it's weak under XP32.
 
Being an AMD stockholder, the QFX, or 4x4 as many know it, has left me less then thrilled. Could I even say less then disappointed? I was thinking of building a new computer around one of these things, but as it stands, forget it.

A new motherboard might help, but looking at how crowded the Nvidia motherboard is, I wonder if an all new motherboard is needed, one that is physically larger so there is more room for the cpu's, more room for bigger and better coolers, and more room for PCI/PCIe slots. Things look so crowded at present that I wonder if it would really be to fit a pair of 8950GT cards and still have room for a sound and a PCI card.

There seem to be too many problem to overcome. Sure, maybe a lot of development could help, but why not put that development into the AM2 board and its cpu's, or even the AM3? It looks to me that the present route is nothing but dollars wasted and opportunities lost. And I can gaurentee that at the next stockholders meeting, a lot of questions are going to be getting asked. Wonder how many heads will roll because of this QFX failure?

Thsi si why I see this as being AMDs first reference platform. They knew the challenges (Opteron 2218 manages to stay cool even @119W) and for the first release with a new chip that fits ina server socket without ECC it's nto a horrible failure. If you look around at teh various reviws you will see that different test scenarios provide different results.

I think Hexus and Anand has the most useful information as to its value. It could hardly be called a failure, though the choices made by Asus and nVidia seem to have led to a less than stellar implementation.

I'm sure that right now AMD labs are moving fast to optime RD600 for a QFX version. ATi is known to have the lowest power chipsets amongst the Big 3 so I am confident that it will use a lot less than 2 590 SLI chips.

AN 80nm R600 with lowe rpower GDDR4 just may drop those power levels from the 250W+ that current 90nm chips are runnign at to a more reasonable sub200W (though I'm sure not by much) at load.

Still, though the AlienWare system is not on their site and it will eb interestign to see what it does, with and without water. It an't lower consumption of the chips but the entire system draws power.

Anand's 100W lower numbers were even with Raptor RAID AND 8800GTX, so you can get the power down. You'll be better off waiting for Vista X64 and more reasonably priced G80s anyway. DX10 may also do much better since it allows for much more complex scenes with more objects (obviously requiring more bandwidth).

It's still on my list I can say.
 
A chipset is not going to make this turkey fly.....

Baron, it is a failure of the architecture.... the cHT and NUMA arrangement was never meant for the desktop.

Perhaps when quad core comes out and if they can cut the latency by 80% between the two CPUs, then you will see something. Until then, anytime you strap in a second CPU it will hurt performance.


Sure, server apps have a slightly different requirement but renderign uses all of the subsystems also and the QFX shines. SInce most games will fit into 2GB (at least right now), a better system for keeping track of the CPU the memory is loaded to will do a lot.

Hexus is also reporting that Vista has been shown to have a much better shceduler. I saw the RC2 tests and can say that they were 32bit. 64bit NUMA should do much better, btu again if there is a problem in the implementation no amount of optimization will help.

Again, hopefully AMD will get the power down and fix the problem with latency. Drooping the latency back to near-FX62 levels (possible, I don't know I can't find many latency tests for Opteron) then a lot of the speed will be realized.

Hexus also stated that they had a "pre-release" system and that the BIOS fix was forthcoming. No one else mentioned this poblem so I don't know if they even looked at it as a possible cause.

It does multitask like crazy though as tests show that as background processes increase, QFX overtakes C2Q.

We'll have to wait for Vista X64 either way. I'm trying to get the Business version now and as soon as I do, I will be upgrading to it in prep for my next HW upgrade.

Looking at Anand's tests with the new Valve Multithreaded engine, QFX show nearly 100% scaling and at some speeds a little more. That is theoretical but it does show that when coded properly, it does narrow the diffeence between X6800 and FX62.

WIth a highly optimized NUMA implementation along with different grades of "clock" allowing for 1066 (turn down the multiplier and turn up the HT speed) I can see X64 multithreading being much better in the areas where it's weak under XP32.
When a CPU it's good, it performs well almost everywhere, we're not yet at the point to have such task-specifis CPUs and furthermore claim one as such. it's just like when they say that a celeron is good for office apps.
 
Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

Sounds like wishful thinking to me.

Let's go AMD! Show your true potential.

Baron, 4x4 IS showing it's true potential. It had the potential to be expensive, hot, and use lots of power and so far as I can tell it's gone for the hat-trick and scored on every count.

What you say sound sliek wishful thinking. Thsi is teh first iteration of what is a very complex system. I think Asus and nVidia could have implemented the board better more than anything. 20 USB ports 4 GPU slots 12 SATA (though I like the eSATA port).

Look at ALL of the reviews and you will see that systems sold in Feb with Vista X64 will be much better in all respects (even though the Valve tests vindicate all the gaming losses). C2Q is core for core faster so at best it will tie it or maybe lead by a % or two.

I was never considering it to be faster than C2Q, but a formidable multi-threading machine.

Again I think we'll hear soon about the ATi chipset which should cut back to 3 slots and one chipset. AMD worked closely with MS on X64 NUMA so they can get more perf out of it than nVidia.

We'll see though. I still see a DX10 card in one of these sitting under my desk hopefully NOT baking my legs. My ANTEC TX should keep that puppy cool.

I can assume that just like Anand got CnQ working and got idle power down significantly, Hexus will get a better BIOS for the board that will help with latency.
 
When a CPU it's good, it performs well almost everywhere, we're not yet at the point to have such task-specifis CPUs and furthermore claim one as such. it's just like when they say that a celeron is good for office apps.

The tests clearly show that NUMA is broken on the board. It will work for desktop apps if implemented properly. An HT "hop" is not like an FSB "hop." Improvements there will push QFX well above FX62 everywhere as it does with CPU intensive things like CineBench.

I'm confident that AMD is serious about making this work and they have a lot of experience now with dual sockets, so just like people didn't want the first rev of C2D, they may not want the first rev of this.
 
Yes, in terms of megatasking -- after you reach 12 background processes competing then finally the BW advantage begins to show up.... this helps you if you want to run 6 VMs with say 2 process in each VM... but even for Enthusiast Joe-- I doubt they will be burning 6 DVDs, encoding 4 video clips, and playing 2 games at the same time..... kinda pointless. The TechReport review was very kind in how they setup and ran the benchmarks, C2Q still came out on top.... their memory bench, for example was carefully chosen -- the did not show the 2 K, 4 K blocks with the 256 or 512 byte strides.... that was where 4x4 really fell apart -- it is not truly objective review they way they handled it.... and you should be more challenging of the data so as not to be fooled.

The valve test were indeed interesting --- C2Q just crushed the 4x4 in this regard.

I gues we'll see. I dont' say I'm an authority on anything but SW development, but it seems interestign that ANad reported 456W at load and others were closer to 600W. How could one core that's faster be slower than 4?

I didn't expect it to be faster than C2Q but Intel is usually faster in theoretical apps. If you looked closely the increase for AMD was several % higher going from 2 to 4 cores.

The Inq link (wow) says nothing about latency.


Anand's tests show clearly exactly what I expected.

13605.png


13604.png


Both tests show that clock for clock AMD is scaling at better than 100% with QFX while C2 is not getting the same scaling. The theoretical nature of the tests show the same thing that PD used to show.

ANd why, you ask is this test showing improvements while being game based?

Because this platform does wonders as you use it properly SW-wise. That's not to say that current SW isn't good enoguh but everyone remarks that even C2Q is suffering because of the single/dual threaded nature of most SW.

Even C2Q is core for core slower than C2D so if AMD can increase they will be bucking the trend. I'm confident that they will improve th eplatform well before Agena FX.
 
Give up, it does not work and will not work, the BIOS is not responsible for NUMA anyway, it is only responsible for initializing the processor to map the memory a particular way. A BIOS update will have little to no affect at all.

Mapping the memory incorrectly won't be a problem? Surely you jest. Again, the Valve tests show 100% scaling which bodes well for DX10 gaming. Games are the only place where you see slowdowns (mainly - there were one or two oher cases).

I am not thinking about this latform for 2006, but for 2007 when there will be CrySis and Alan Wake, etc along with Vista X64.

Also, if I decide to go with Agena FX (AMD name) before 2008, I will buy a new mobo as well ( the prices will be right by then - the right chipset would allow the same reference for both ECC and non-ECC).

I would never expect anything but your reported 100nm spacing to get power down which may happen, especially since 65nm chips will drop costs enough to allow specialized runs for QFX. They can't be more than 5% of shipments even in pairs.
 
Give up, it does not work and will not work, the BIOS is not responsible for NUMA anyway, it is only responsible for initializing the processor to map the memory a particular way. A BIOS update will have little to no affect at all.

Mapping the memory incorrectly won't be a problem? Surely you jest. Again, the Valve tests show 100% scaling which bodes well for DX10 gaming. Games are the only place where you see slowdowns (mainly - there were one or two oher cases).

No I don't jest, it will not work... you don't know what you are talking about. Interleaved or contiguous, there will be HOPs it solves NOTHING.

You clearly do not understand how this is working --- ok so the BIOS is broken, it cannot interleave --- again, NUMA OS benches are showing some improvement but the platform is still inferior to a single socket Intel solution.


I understand enetirely. The SW landscape will change to accomodate more cores and games will be the first.

VALVE'S SMP TESTS SHOW GREATER THAN 100% SCALING!!

End of statement.
 
13605.png


13604.png


Both tests show that clock for clock AMD is scaling at better than 100% with QFX while C2 is not getting the same scaling. The theoretical nature of the tests show the same thing that PD used to show.

Valve VRAD Map Compilation
Intel C2Q 2.66GHz - 2.58 minutes
Intel C2D 2.66GHz - 4.63 minutes
-----------------------------------------
Intel C2Q vs C2D - 1.795x advantage
==============================
AMD FX-72 2.8GHz - 3.22 minutes
AMD FX-62 2.8GHz - 5.55 minutes
-----------------------------------------
AMD FX-72 vs FX-62 - 1.723x advantage


Valve Particle Systems Test
Intel C2Q 2.66GHz - 85
Intel C2D 2.66GHz - 44
-----------------------------------------
Intel C2Q vs C2D - 1.932x advantage
=============================
AMD FX-72 2.8GHz - 58
AMD FX-62 2.8GHz - 28
-----------------------------------------
AMD FX-72 vs FX-62 - 2.071x advantage

Conclusion
Intel shows better scaling in the 'VRAD Map Compilation' benchmark, but AMD shows better scalling in the 'Particle Systems Test' benchmark. Hardly an advantage to either camp in terms of pure scaling.

However, regardless of scaling, it is clear that Intel is significantly faster than AMD in both tests.

The only reason it looks like AMD scales better by looking at the graphs is because you are looking at the scaling of a 2.8GHz FX-62 vs 3GHz FX-74, whereas with Intel you are looking at a 2.93GHz X6800 vs 2.66GHz QX6700.

Notice how I calculated scaling by comparing at the same clockspeeds?

Paints a whole different picture, doesn't it Baron? :roll:

Myth debunked! Nothing to see here people, just AMD fanboy dribble. Move along now...
 
Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.
I still hope to see an AMD/ATI 4x4 motherboard later on. You should wait until that happens instead of rushing to buy the current AMD/nVidia offering for Xmas. You already have two good dual-core systems that should be more than enough to tide you over until then.
 
Anands test show clearly what most people expected, a 3.0 GHz quad core system from AMD 'strung' together by a serial line is getting it's A$$ kicked by QX6700 (a 2.67 GHz processor) and even a Q6600 (a 2.4 GHz processor), barcelona had better be something special.

Also, this is a poor data set for you to be 'showing off' the mediocrity of your x-mas platform. Your 'perfect scaling' is not accounting for clockspeed delta's as well.


If you compare clock for clock QFX

COULD NOT

win against Core2. I have said that it will close the clock for clock gap between K8 and Core 2 and for the most part it does. Valve's tests show the scaling I considered possible. heavy use cases show that it is a mega-tasking platform where it kills FX62 at the same clock.

The myriad differences in testing methodlogies acros the reviews show that some people wil get the golden egg and others will get the raspberry. My usage patterns, apps and requirements get me the golden egg ( I only need 60 fps).

I buy my PDs for productivity apps and not games where even PCMark is showing improvements over FX62.

I currently have a 4400+ with a 7800GT. FX70 would be a close to 150% improvement. Once R600 comes out and G80 has lower prices, Vista will be around (X64 NUMA goodness) and I will be able to get an AMD chipset with hopefully a new rev.

My XMas deal was based on DX10 cards. They are currently costing more than I want to spend with more perf than I need. I guess 1600 LCDs have somethign to do with it, also.

Has anyone played 1600x1200 on a widescreen 1680x1050 LCD? The desktop looks sctretched, though they are much cheaper.

Anyway, I digress. If NUMA works properly with Windows and you ar enot above both the total RAM per socket and total RAM, you should never have processes spanning sockets and data placement.

I remember posting the ideal mechanism for avoiding hops.

In the best case scenario for QFX - RAM wise - you have 4GB of RAM which even Vista's process load can't overwhelm, all of the OS processes AND data can be loaded to CPU 0 and CPU 1.

If the current request allocates more RAM than is available in the first set, the process is loaded in the second set with CPU 2 and CPU 3. Now if swapping is required by the process "over-filling" the second set, it should maintain a contiguous line by not reloading data to CPU 0/1s RAM banks in the case of the process that caused the overfill.

In this scenario, all game threads (single, dual or multi) should remain on CPU2/3 with their data. It may require a patch on AMDs part that assures this (like the dual core patch for sync) but this scenario should allow the two cores to fully use all of their potential bandwidth and processing wise.

For cases where OS code needs to be called, it should be just be passing the necessary data and recieving a return from CPU 0/1.

I expect big things with a refined implementation. Even you realized the defects in the first rev of C2D.
 
I see Baron has conveniently decided to ignore my scaling analysis. :roll:

Btw, where'd you get that picture Jack? Hilarious stuff! :lol:

Edit - I see... some photochop work there... fooled me! LOL nice work!
 
Well, since those look like intel CPU's on the left.... they actually could match the quad performance. Put a new heat spreader on, sell them for less than intel, and let the flood gates open.

wes
 
sailer you did this to me !!!! make it stop!

Please, oh please, think of the puppy dog's ppor ears. We know that no grammies are coming out of these tune's. We know that Baron's nose is growing longer than Pinocchio's.

I think I shall put in some ear plugs and grab another beer.