Sign in with
Sign up | Sign in
Your question

Can AMD salvage QFX with an in-house chipset?

Tags:
Last response: in CPUs
Share
December 2, 2006 5:09:43 PM

Well, the QFX has been released and while certain scenarios show incredible promise, certain areas are also saddled with too much baggage.

The fact that the Opteron dual can be outfitted with SLI in a wksta and offer good perf without these power levels implies that the total package could have been done better.

The Opteron 285 runs at 2.6GHz and this graph shows that without the additional SLI power AMD runs at 322W and the dual 5160 runs at 267W (full load).

Looking at the varous articles around teh web, it seems as though only Anand managed to actually find suitable tasks that were reasonable for multi-tasking. In his case he used BluRay movies which totally killed all the dual core systems.

His power numbers were also at least 100W lower than other test sites. He turned on CnQ and got the idle temps down to within 4W of the C2Q system. Of course this didn't dent the 456W the system drew at "full load" with an 8800GTX ( which I believe draws 225W+ ).

And these are FX74 numbers. FX70 is shown to use even less so I believe OEMs can get reasonable wksta power levels out of it in teh next few months, especially if AMD releases a new rev( they sorely need to drop power by at least 10%- perhaps more and the lessons learned can help get Agena down below the reported 125W)

Hexus is also reporting that they can show a defect in the NUMA implementation of the Asus board BIOS ( this post was going to be called "Did Asus and nVidia drop the QFX ball") that maybe why games are suffering so much from latency problems.

One review ( most are posted at AMDZone) stated that AMD is reporting that the Interleave mode will need to be turned off for Vista and on for XP.

I believe AMD reported that they would release their own branded chipset and hopefully it will be less power hungry than the 680a, which is reported to use more power than even 975X. Having two of them surely doesn't help. nVidia does have a two socket SLI hipset in the 3600 and ASUS' implementation is only $300. Even the $400 Asus 680i for Intel implements less PCIe for less power reserves.

Because 7950GT and the probably forthcoming 8950GT only require two slots for Quad SLI. I can see the need for 4 low end GPUs for certain content creators but even 3 PCIe slots can't really be used right now as no "Havok" type apps or cards have been released, except for the server (AMD Stream).

Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

But the real judgement is that only expert builders will make QFX something not too loud or hot, while Vista X64 may do wonders for it in multithreaded apps so it is not yet ready for prime time.

Let's go AMD! Show your true potential.
December 2, 2006 6:03:47 PM

Quote:
Hexus is also reporting that they can show a defect in the NUMA implementation of the Asus board BIOS ( this post was going to be called "Did Asus and nVidia drop the QFX ball") that maybe why games are suffering so much from latency problems.
I was under the impression that XP wasn't NUMA aware... would that even be a problem under XP?
December 2, 2006 6:04:14 PM

sorry about the double post... it said there was an error, i figured it didn't post it :oops: 
Related resources
December 2, 2006 6:32:59 PM

Being an AMD stockholder, the QFX, or 4x4 as many know it, has left me less then thrilled. Could I even say less then disappointed? I was thinking of building a new computer around one of these things, but as it stands, forget it.

A new motherboard might help, but looking at how crowded the Nvidia motherboard is, I wonder if an all new motherboard is needed, one that is physically larger so there is more room for the cpu's, more room for bigger and better coolers, and more room for PCI/PCIe slots. Things look so crowded at present that I wonder if it would really be to fit a pair of 8950GT cards and still have room for a sound and a PCI card.

There seem to be too many problem to overcome. Sure, maybe a lot of development could help, but why not put that development into the AM2 board and its cpu's, or even the AM3? It looks to me that the present route is nothing but dollars wasted and opportunities lost. And I can gaurentee that at the next stockholders meeting, a lot of questions are going to be getting asked. Wonder how many heads will roll because of this QFX failure?
December 2, 2006 8:29:24 PM

Quote:
Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.


Sounds like wishful thinking to me.

Quote:
Let's go AMD! Show your true potential.


Baron, 4x4 IS showing it's true potential. It had the potential to be expensive, hot, and use lots of power and so far as I can tell it's gone for the hat-trick and scored on every count.
December 2, 2006 8:41:31 PM

Quote:
Hey, now you simply gotta believe.... believe in miracles.... believe that that potential is there it just hasn't been unleashed.... believe that if you sprinkle pixie dust over the rig it will cool down by 1/2 and improve 20 % in performance, then it would be competitive.

If you build it they will come....

:)  :) 


Yes, I believe! I believe I'll get a beer.

Where, oh where, is AM3, K8L, and the other promised stuff. Or will those things be broken dreams as well?
December 2, 2006 8:52:06 PM

Yeah, we gotta keep smiles on our faces and things in perspective. The 65nm may help things out a lot. I hope so. Its that so much was expected and so little delivered. I think it would have been better if the QFX had been withdrawn before the embarassment occured. But that's an opinion.

Perhaps everybody, both AMD and enthusiasts alike, learn something from this it will be good. In the meantime, I'll keep trying to get some more out of my present cpu. Seems like I can get 2990 mhz and be stable, but the moment I cross 3000, something fails. Not sure yet if its the ram or the video card or what. Just know that 3DMark06 starts and then crashes out after a few seconds. Gotta keep looking.
December 2, 2006 9:27:09 PM

doesnt anyone think that 4x4 could be a really huge ace in the hole for AMD? I mean, now AMD have brought out this half server, half enthusiast nightmare, people have been exposed to the prospect of buying two processors and putting them on a board together. This idea, brought into the mainstream could mean that in the future, AMD could have the jump on Intel forever. If AMD can catch up to the increasing core count that intel are performing at the moment, and manage to bring out quad cores soon, then we could stick two quad core chips into one of these boards and have octo core computers with minimum hassel.

This line of speculation is, of course, considering that AMD can bring out significantly cooler quad cores, to counteract the fact that there would be two of them sitting side by side and also make it attractive in total system cost with cheaper CPU's and not significantly more expensive mobo's.
December 2, 2006 9:31:24 PM

Salvage? But BaronBS I thought this was supposed to be the sh!t, no?
December 2, 2006 9:51:59 PM

1. 4x4 is still FUTURE technology (at least until the release of the 65nm quads to populate it).
2. The 'NEW' FX chips are AMDs greatest failure ever.
This is the toughest moment of the actual depression for them and they've got to show some quad performance soon, ad it's got to be well convincing for the sake of their future.
December 2, 2006 9:53:43 PM

Quote:
A chipset is not going to make this turkey fly.....

Baron, it is a failure of the architecture.... the cHT and NUMA arrangement was never meant for the desktop.

Perhaps when quad core comes out and if they can cut the latency by 80% between the two CPUs, then you will see something. Until then, anytime you strap in a second CPU it will hurt performance.



Sure, server apps have a slightly different requirement but renderign uses all of the subsystems also and the QFX shines. SInce most games will fit into 2GB (at least right now), a better system for keeping track of the CPU the memory is loaded to will do a lot.

Hexus is also reporting that Vista has been shown to have a much better shceduler. I saw the RC2 tests and can say that they were 32bit. 64bit NUMA should do much better, btu again if there is a problem in the implementation no amount of optimization will help.

Again, hopefully AMD will get the power down and fix the problem with latency. Drooping the latency back to near-FX62 levels (possible, I don't know I can't find many latency tests for Opteron) then a lot of the speed will be realized.

Hexus also stated that they had a "pre-release" system and that the BIOS fix was forthcoming. No one else mentioned this poblem so I don't know if they even looked at it as a possible cause.

It does multitask like crazy though as tests show that as background processes increase, QFX overtakes C2Q.

We'll have to wait for Vista X64 either way. I'm trying to get the Business version now and as soon as I do, I will be upgrading to it in prep for my next HW upgrade.

Looking at Anand's tests with the new Valve Multithreaded engine, QFX show nearly 100% scaling and at some speeds a little more. That is theoretical but it does show that when coded properly, it does narrow the diffeence between X6800 and FX62.

WIth a highly optimized NUMA implementation along with different grades of "clock" allowing for 1066 (turn down the multiplier and turn up the HT speed) I can see X64 multithreading being much better in the areas where it's weak under XP32.
December 2, 2006 10:06:05 PM

Quote:
Being an AMD stockholder, the QFX, or 4x4 as many know it, has left me less then thrilled. Could I even say less then disappointed? I was thinking of building a new computer around one of these things, but as it stands, forget it.

A new motherboard might help, but looking at how crowded the Nvidia motherboard is, I wonder if an all new motherboard is needed, one that is physically larger so there is more room for the cpu's, more room for bigger and better coolers, and more room for PCI/PCIe slots. Things look so crowded at present that I wonder if it would really be to fit a pair of 8950GT cards and still have room for a sound and a PCI card.

There seem to be too many problem to overcome. Sure, maybe a lot of development could help, but why not put that development into the AM2 board and its cpu's, or even the AM3? It looks to me that the present route is nothing but dollars wasted and opportunities lost. And I can gaurentee that at the next stockholders meeting, a lot of questions are going to be getting asked. Wonder how many heads will roll because of this QFX failure?


Thsi si why I see this as being AMDs first reference platform. They knew the challenges (Opteron 2218 manages to stay cool even @119W) and for the first release with a new chip that fits ina server socket without ECC it's nto a horrible failure. If you look around at teh various reviws you will see that different test scenarios provide different results.

I think Hexus and Anand has the most useful information as to its value. It could hardly be called a failure, though the choices made by Asus and nVidia seem to have led to a less than stellar implementation.

I'm sure that right now AMD labs are moving fast to optime RD600 for a QFX version. ATi is known to have the lowest power chipsets amongst the Big 3 so I am confident that it will use a lot less than 2 590 SLI chips.

AN 80nm R600 with lowe rpower GDDR4 just may drop those power levels from the 250W+ that current 90nm chips are runnign at to a more reasonable sub200W (though I'm sure not by much) at load.

Still, though the AlienWare system is not on their site and it will eb interestign to see what it does, with and without water. It an't lower consumption of the chips but the entire system draws power.

Anand's 100W lower numbers were even with Raptor RAID AND 8800GTX, so you can get the power down. You'll be better off waiting for Vista X64 and more reasonably priced G80s anyway. DX10 may also do much better since it allows for much more complex scenes with more objects (obviously requiring more bandwidth).

It's still on my list I can say.
December 2, 2006 10:07:52 PM

Quote:
A chipset is not going to make this turkey fly.....

Baron, it is a failure of the architecture.... the cHT and NUMA arrangement was never meant for the desktop.

Perhaps when quad core comes out and if they can cut the latency by 80% between the two CPUs, then you will see something. Until then, anytime you strap in a second CPU it will hurt performance.



Sure, server apps have a slightly different requirement but renderign uses all of the subsystems also and the QFX shines. SInce most games will fit into 2GB (at least right now), a better system for keeping track of the CPU the memory is loaded to will do a lot.

Hexus is also reporting that Vista has been shown to have a much better shceduler. I saw the RC2 tests and can say that they were 32bit. 64bit NUMA should do much better, btu again if there is a problem in the implementation no amount of optimization will help.

Again, hopefully AMD will get the power down and fix the problem with latency. Drooping the latency back to near-FX62 levels (possible, I don't know I can't find many latency tests for Opteron) then a lot of the speed will be realized.

Hexus also stated that they had a "pre-release" system and that the BIOS fix was forthcoming. No one else mentioned this poblem so I don't know if they even looked at it as a possible cause.

It does multitask like crazy though as tests show that as background processes increase, QFX overtakes C2Q.

We'll have to wait for Vista X64 either way. I'm trying to get the Business version now and as soon as I do, I will be upgrading to it in prep for my next HW upgrade.

Looking at Anand's tests with the new Valve Multithreaded engine, QFX show nearly 100% scaling and at some speeds a little more. That is theoretical but it does show that when coded properly, it does narrow the diffeence between X6800 and FX62.

WIth a highly optimized NUMA implementation along with different grades of "clock" allowing for 1066 (turn down the multiplier and turn up the HT speed) I can see X64 multithreading being much better in the areas where it's weak under XP32.
When a CPU it's good, it performs well almost everywhere, we're not yet at the point to have such task-specifis CPUs and furthermore claim one as such. it's just like when they say that a celeron is good for office apps.
December 2, 2006 10:16:51 PM

Quote:
Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.


Sounds like wishful thinking to me.

Quote:
Let's go AMD! Show your true potential.


Baron, 4x4 IS showing it's true potential. It had the potential to be expensive, hot, and use lots of power and so far as I can tell it's gone for the hat-trick and scored on every count.

What you say sound sliek wishful thinking. Thsi is teh first iteration of what is a very complex system. I think Asus and nVidia could have implemented the board better more than anything. 20 USB ports 4 GPU slots 12 SATA (though I like the eSATA port).

Look at ALL of the reviews and you will see that systems sold in Feb with Vista X64 will be much better in all respects (even though the Valve tests vindicate all the gaming losses). C2Q is core for core faster so at best it will tie it or maybe lead by a % or two.

I was never considering it to be faster than C2Q, but a formidable multi-threading machine.

Again I think we'll hear soon about the ATi chipset which should cut back to 3 slots and one chipset. AMD worked closely with MS on X64 NUMA so they can get more perf out of it than nVidia.

We'll see though. I still see a DX10 card in one of these sitting under my desk hopefully NOT baking my legs. My ANTEC TX should keep that puppy cool.

I can assume that just like Anand got CnQ working and got idle power down significantly, Hexus will get a better BIOS for the board that will help with latency.
December 2, 2006 10:22:39 PM

Quote:
When a CPU it's good, it performs well almost everywhere, we're not yet at the point to have such task-specifis CPUs and furthermore claim one as such. it's just like when they say that a celeron is good for office apps.


The tests clearly show that NUMA is broken on the board. It will work for desktop apps if implemented properly. An HT "hop" is not like an FSB "hop." Improvements there will push QFX well above FX62 everywhere as it does with CPU intensive things like CineBench.

I'm confident that AMD is serious about making this work and they have a lot of experience now with dual sockets, so just like people didn't want the first rev of C2D, they may not want the first rev of this.
December 2, 2006 10:39:31 PM

Quote:
Yes, in terms of megatasking -- after you reach 12 background processes competing then finally the BW advantage begins to show up.... this helps you if you want to run 6 VMs with say 2 process in each VM... but even for Enthusiast Joe-- I doubt they will be burning 6 DVDs, encoding 4 video clips, and playing 2 games at the same time..... kinda pointless. The TechReport review was very kind in how they setup and ran the benchmarks, C2Q still came out on top.... their memory bench, for example was carefully chosen -- the did not show the 2 K, 4 K blocks with the 256 or 512 byte strides.... that was where 4x4 really fell apart -- it is not truly objective review they way they handled it.... and you should be more challenging of the data so as not to be fooled.

The valve test were indeed interesting --- C2Q just crushed the 4x4 in this regard.


I gues we'll see. I dont' say I'm an authority on anything but SW development, but it seems interestign that ANad reported 456W at load and others were closer to 600W. How could one core that's faster be slower than 4?

I didn't expect it to be faster than C2Q but Intel is usually faster in theoretical apps. If you looked closely the increase for AMD was several % higher going from 2 to 4 cores.

The Inq link (wow) says nothing about latency.


Anand's tests show clearly exactly what I expected.





Both tests show that clock for clock AMD is scaling at better than 100% with QFX while C2 is not getting the same scaling. The theoretical nature of the tests show the same thing that PD used to show.

ANd why, you ask is this test showing improvements while being game based?

Because this platform does wonders as you use it properly SW-wise. That's not to say that current SW isn't good enoguh but everyone remarks that even C2Q is suffering because of the single/dual threaded nature of most SW.

Even C2Q is core for core slower than C2D so if AMD can increase they will be bucking the trend. I'm confident that they will improve th eplatform well before Agena FX.
December 2, 2006 10:48:38 PM

Quote:
Give up, it does not work and will not work, the BIOS is not responsible for NUMA anyway, it is only responsible for initializing the processor to map the memory a particular way. A BIOS update will have little to no affect at all.


Mapping the memory incorrectly won't be a problem? Surely you jest. Again, the Valve tests show 100% scaling which bodes well for DX10 gaming. Games are the only place where you see slowdowns (mainly - there were one or two oher cases).

I am not thinking about this latform for 2006, but for 2007 when there will be CrySis and Alan Wake, etc along with Vista X64.

Also, if I decide to go with Agena FX (AMD name) before 2008, I will buy a new mobo as well ( the prices will be right by then - the right chipset would allow the same reference for both ECC and non-ECC).

I would never expect anything but your reported 100nm spacing to get power down which may happen, especially since 65nm chips will drop costs enough to allow specialized runs for QFX. They can't be more than 5% of shipments even in pairs.
December 2, 2006 11:10:31 PM

Quote:
Give up, it does not work and will not work, the BIOS is not responsible for NUMA anyway, it is only responsible for initializing the processor to map the memory a particular way. A BIOS update will have little to no affect at all.


Mapping the memory incorrectly won't be a problem? Surely you jest. Again, the Valve tests show 100% scaling which bodes well for DX10 gaming. Games are the only place where you see slowdowns (mainly - there were one or two oher cases).

No I don't jest, it will not work... you don't know what you are talking about. Interleaved or contiguous, there will be HOPs it solves NOTHING.

You clearly do not understand how this is working --- ok so the BIOS is broken, it cannot interleave --- again, NUMA OS benches are showing some improvement but the platform is still inferior to a single socket Intel solution.


I understand enetirely. The SW landscape will change to accomodate more cores and games will be the first.

VALVE'S SMP TESTS SHOW GREATER THAN 100% SCALING!!

End of statement.
December 2, 2006 11:13:52 PM

Quote:






Both tests show that clock for clock AMD is scaling at better than 100% with QFX while C2 is not getting the same scaling. The theoretical nature of the tests show the same thing that PD used to show.


Valve VRAD Map Compilation
Intel C2Q 2.66GHz - 2.58 minutes
Intel C2D 2.66GHz - 4.63 minutes
-----------------------------------------
Intel C2Q vs C2D - 1.795x advantage
==============================
AMD FX-72 2.8GHz - 3.22 minutes
AMD FX-62 2.8GHz - 5.55 minutes
-----------------------------------------
AMD FX-72 vs FX-62 - 1.723x advantage


Valve Particle Systems Test
Intel C2Q 2.66GHz - 85
Intel C2D 2.66GHz - 44
-----------------------------------------
Intel C2Q vs C2D - 1.932x advantage
=============================
AMD FX-72 2.8GHz - 58
AMD FX-62 2.8GHz - 28
-----------------------------------------
AMD FX-72 vs FX-62 - 2.071x advantage

Conclusion
Intel shows better scaling in the 'VRAD Map Compilation' benchmark, but AMD shows better scalling in the 'Particle Systems Test' benchmark. Hardly an advantage to either camp in terms of pure scaling.

However, regardless of scaling, it is clear that Intel is significantly faster than AMD in both tests.

The only reason it looks like AMD scales better by looking at the graphs is because you are looking at the scaling of a 2.8GHz FX-62 vs 3GHz FX-74, whereas with Intel you are looking at a 2.93GHz X6800 vs 2.66GHz QX6700.

Notice how I calculated scaling by comparing at the same clockspeeds?

Paints a whole different picture, doesn't it Baron? :roll:

Myth debunked! Nothing to see here people, just AMD fanboy dribble. Move along now...
December 2, 2006 11:45:47 PM

Quote:
Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

I still hope to see an AMD/ATI 4x4 motherboard later on. You should wait until that happens instead of rushing to buy the current AMD/nVidia offering for Xmas. You already have two good dual-core systems that should be more than enough to tide you over until then.
December 2, 2006 11:51:03 PM

Quote:
Anands test show clearly what most people expected, a 3.0 GHz quad core system from AMD 'strung' together by a serial line is getting it's A$$ kicked by QX6700 (a 2.67 GHz processor) and even a Q6600 (a 2.4 GHz processor), barcelona had better be something special.

Also, this is a poor data set for you to be 'showing off' the mediocrity of your x-mas platform. Your 'perfect scaling' is not accounting for clockspeed delta's as well.



If you compare clock for clock QFX

COULD NOT

win against Core2. I have said that it will close the clock for clock gap between K8 and Core 2 and for the most part it does. Valve's tests show the scaling I considered possible. heavy use cases show that it is a mega-tasking platform where it kills FX62 at the same clock.

The myriad differences in testing methodlogies acros the reviews show that some people wil get the golden egg and others will get the raspberry. My usage patterns, apps and requirements get me the golden egg ( I only need 60 fps).

I buy my PDs for productivity apps and not games where even PCMark is showing improvements over FX62.

I currently have a 4400+ with a 7800GT. FX70 would be a close to 150% improvement. Once R600 comes out and G80 has lower prices, Vista will be around (X64 NUMA goodness) and I will be able to get an AMD chipset with hopefully a new rev.

My XMas deal was based on DX10 cards. They are currently costing more than I want to spend with more perf than I need. I guess 1600 LCDs have somethign to do with it, also.

Has anyone played 1600x1200 on a widescreen 1680x1050 LCD? The desktop looks sctretched, though they are much cheaper.

Anyway, I digress. If NUMA works properly with Windows and you ar enot above both the total RAM per socket and total RAM, you should never have processes spanning sockets and data placement.

I remember posting the ideal mechanism for avoiding hops.

In the best case scenario for QFX - RAM wise - you have 4GB of RAM which even Vista's process load can't overwhelm, all of the OS processes AND data can be loaded to CPU 0 and CPU 1.

If the current request allocates more RAM than is available in the first set, the process is loaded in the second set with CPU 2 and CPU 3. Now if swapping is required by the process "over-filling" the second set, it should maintain a contiguous line by not reloading data to CPU 0/1s RAM banks in the case of the process that caused the overfill.

In this scenario, all game threads (single, dual or multi) should remain on CPU2/3 with their data. It may require a patch on AMDs part that assures this (like the dual core patch for sync) but this scenario should allow the two cores to fully use all of their potential bandwidth and processing wise.

For cases where OS code needs to be called, it should be just be passing the necessary data and recieving a return from CPU 0/1.

I expect big things with a refined implementation. Even you realized the defects in the first rev of C2D.
December 2, 2006 11:59:27 PM

Quote:
AMD's approach to answer QX6700:


* Courtesy of some creative photoshoping by gOJDO.

Heheheh! Thanks gOJDO! Thanks JJ!
December 3, 2006 12:08:26 AM

I see Baron has conveniently decided to ignore my scaling analysis. :roll:

Btw, where'd you get that picture Jack? Hilarious stuff! :lol: 

Edit - I see... some photochop work there... fooled me! LOL nice work!
December 3, 2006 12:10:21 AM

Well, since those look like intel CPU's on the left.... they actually could match the quad performance. Put a new heat spreader on, sell them for less than intel, and let the flood gates open.

wes
December 3, 2006 12:17:10 AM

Quote:
sailer you did this to me !!!! make it stop!


Please, oh please, think of the puppy dog's ppor ears. We know that no grammies are coming out of these tune's. We know that Baron's nose is growing longer than Pinocchio's.

I think I shall put in some ear plugs and grab another beer.
December 3, 2006 12:44:38 AM

Quote:
A chipset is not going to make this turkey fly.....

Baron, it is a failure of the architecture.... the cHT and NUMA arrangement was never meant for the desktop.

Perhaps when quad core comes out and if they can cut the latency by 80% between the two CPUs, then you will see something. Until then, anytime you strap in a second CPU it will hurt performance.


No wonder AMD didn't buy Nvidia. How dare they mess up the
launch of the "performance at any cost" desktop. Opps! At any cost,
I hope they didn't mean losing loyal AMD fanboys.
December 3, 2006 12:47:06 AM

Quote:
Well, the QFX has been released and while certain scenarios show incredible promise, certain areas are also saddled with too much baggage.

The fact that the Opteron dual can be outfitted with SLI in a wksta and offer good perf without these power levels implies that the total package could have been done better.

The Opteron 285 runs at 2.6GHz and this graph shows that without the additional SLI power AMD runs at 322W and the dual 5160 runs at 267W (full load).

Looking at the varous articles around teh web, it seems as though only Anand managed to actually find suitable tasks that were reasonable for multi-tasking. In his case he used BluRay movies which totally killed all the dual core systems.

His power numbers were also at least 100W lower than other test sites. He turned on CnQ and got the idle temps down to within 4W of the C2Q system. Of course this didn't dent the 456W the system drew at "full load" with an 8800GTX ( which I believe draws 225W+ ).

And these are FX74 numbers. FX70 is shown to use even less so I believe OEMs can get reasonable wksta power levels out of it in teh next few months, especially if AMD releases a new rev( they sorely need to drop power by at least 10%- perhaps more and the lessons learned can help get Agena down below the reported 125W)

Hexus is also reporting that they can show a defect in the NUMA implementation of the Asus board BIOS ( this post was going to be called "Did Asus and nVidia drop the QFX ball") that maybe why games are suffering so much from latency problems.

One review ( most are posted at AMDZone) stated that AMD is reporting that the Interleave mode will need to be turned off for Vista and on for XP.

I believe AMD reported that they would release their own branded chipset and hopefully it will be less power hungry than the 680a, which is reported to use more power than even 975X. Having two of them surely doesn't help. nVidia does have a two socket SLI hipset in the 3600 and ASUS' implementation is only $300. Even the $400 Asus 680i for Intel implements less PCIe for less power reserves.

Because 7950GT and the probably forthcoming 8950GT only require two slots for Quad SLI. I can see the need for 4 low end GPUs for certain content creators but even 3 PCIe slots can't really be used right now as no "Havok" type apps or cards have been released, except for the server (AMD Stream).

Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

But the real judgement is that only expert builders will make QFX something not too loud or hot, while Vista X64 may do wonders for it in multithreaded apps so it is not yet ready for prime time.

Let's go AMD! Show your true potential.

The NUMA support in Vista is supposed to be better than in Windows XP so Vista should give better performance. For NUMA to work well in this two node set up, node to node traffic should be minimized. I know that in multi node systems it can sometimes be an advantage to use node interleave mode, such as in databases. The problem with Quad FX on the hardware level is the HT link. I do not think HT2.0 is adequate and it would take a FULL HT3.0 link to do the job viz., a 2.6GHz link that is 32bits wide. If you look at some of the memory benchmark figures you will see that they are very low and are probably caused by excessive cross-fire.
December 3, 2006 12:50:57 AM

Quote:
Intel shows better scaling in the 'VRAD Map Compilation' benchmark, but AMD shows better scalling in the 'Particle Systems Test' benchmark. Hardly an advantage to either camp in terms of pure scaling.

However, regardless of scaling, it is clear that Intel is significantly faster than AMD in both tests.

The only reason it looks like AMD scales better by looking at the graphs is because you are looking at the scaling of a 2.8GHz FX-62 vs 3GHz FX-74, whereas with Intel you are looking at a 2.93GHz X6800 vs 2.66GHz QX6700.

Notice how I calculated scaling by comparing at the same clockspeeds?

Paints a whole different picture, doesn't it Baron?

Myth debunked! Nothing to see here people, just AMD fanboy dribble. Move along now...



Let's look at the numbers (all numbers according to WInodows XP Calculator):

For the particle systems tests

5200+ -- 26
FX 62 -- 28

FX70 -- 55
FX72 -- 58

Scaling:
2.6GHz - 111%
2.8GHz - 107%
no 3GHz numbers



Core 2

E6700 -- 44
C2Q 6700 -- 85


Scaling:
2.66GHz - 97%


This shows greater scaling though clockspeed seems to have a negative effect.
It also shows a reversal from single or dual thread games, but Carmack is busy creating a multi-core engine along with Valve and others, so QFX has a bright 2007 future. Optimizations

MAY EVEN (disclaime due to negative reaction)

increase perf in current single/dual threaded games (though keeping as much data as possible in the right banks will help a lot).
December 3, 2006 1:08:38 AM

Quote:
The NUMA support in Vista is supposed to be better than in Windows XP so Vista should give better performance. For NUMA to work well in this two node set up, node to node traffic should be minimized. I know that in multi node systems it can sometimes be an advantage to use node interleave mode, such as in databases. The problem with Quad FX on the hardware level is the HT link. I do not think HT2.0 is adequate and it would take a FULL HT3.0 link to do the job viz., a 2.6GHz link that is 32bits wide. If you look at some of the memory benchmark figures you will see that they are very low and is probably caused by excessive cross-fire.


You amplify my point. As I explained several times 2GB per socket should allow all game data to be loaded to one set of RAM. I was hoping for 8GB max to allow 4GB per socket but....

It has nothing to do with HT. It has to do with load/swap in Windows along with pre-scheduled to handle cases where apps need to call OS/Win32 functions. Vista has totally improved this code with an emphasis on multi-core functionality.

I'm sure that a side by side comparison will show much more "fragmentation" in XP/2003 than Vista/LongHorn. Latency tests (Hexus) show that a 10% increase would boost QFX above FX62 because of available bandwidth, such that CPU 0/1 has it's own path to memory and having all data (up to 2GB with swapping) for a game on CPU 2/3, watch out, the 5% I predicted will happen.

To Dirk, you need a driver that enables quick OS returns from CPU 0/1.
December 3, 2006 1:27:04 AM

Quote:
The NUMA support in Vista is supposed to be better than in Windows XP so Vista should give better performance. For NUMA to work well in this two node set up, node to node traffic should be minimized. I know that in multi node systems it can sometimes be an advantage to use node interleave mode, such as in databases. The problem with Quad FX on the hardware level is the HT link. I do not think HT2.0 is adequate and it would take a FULL HT3.0 link to do the job viz., a 2.6GHz link that is 32bits wide. If you look at some of the memory benchmark figures you will see that they are very low and are probably caused by excessive cross-fire.


You amplify my point. As I explained several times 2GB per socket should allow all game data to be loaded to one set of RAM. I was hoping for 8GB max to allow 4GB per socket but....

It has nothing to do with HT. It has to do with load/swap in Windows along with pre-scheduled to handle cases where apps need to call OS/Win32 functions. Vista has totally improved this code with an emphasis on multi-core functionality.

I'm sure that a side by side comparison will show much more "fragmentation" in XP/2003 than Vista/LongHorn. Latency tests (Hexus) show that a 10% increase would boost QFX above FX62 because of available bandwidth, such that CPU 0/1 has it's own path to memory and having all data (up to 2GB with swapping) for a game on CPU 2/3, watch out, the 5% I predicted will happen.

To Dirk, you need a driver that enables quick OS returns from CPU 0/1.
If a core accesses its local memory then it has a bandwidth of 12.8GB/s(using DDR2-800). If it accesses the memory of another node, then this transfer has to take place across the HT2.0 link which is only 4GB/s (8GB/s aggregate). So as well as the added latency associated with the node hop, the transfer would take LONGER than it would from local memory-this is why the HT link bandwidth is important.
December 3, 2006 1:33:01 AM

Quote:
(...) In the best case scenario for QFX - RAM wise - you have 4GB of RAM which even Vista's process load can't overwhelm, all of the OS processes AND data can be loaded to CPU 0 and CPU 1.

If the current request allocates more RAM than is available in the first set, the process is loaded in the second set with CPU 2 and CPU 3. Now if swapping is required by the process "over-filling" the second set, it should maintain a contiguous line by not reloading data to CPU 0/1s RAM banks in the case of the process that caused the overfill.

In this scenario, all game threads (single, dual or multi) should remain on CPU2/3 with their data. It may require a patch on AMDs part that assures this (like the dual core patch for sync) but this scenario should allow the two cores to fully use all of their potential bandwidth and processing wise.

For cases where OS code needs to be called, it should be just be passing the necessary data and recieving a return from CPU 0/1.

I expect big things with a refined implementation. Even you realized the defects in the first rev of C2D.


Well, that'd be stretching the pass a bit, wouldn't it? My guess would be that, 1MB L2 cache per core wouldn't help much, would it? Does AMD use 'NUCA', cache-wise?
http://portal.acm.org/citation.cfm?id=1077690


Cheers!
December 3, 2006 1:39:56 AM

Quote:
Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

I still hope to see an AMD/ATI 4x4 motherboard later on. You should wait until that happens instead of rushing to buy the current AMD/nVidia offering for Xmas. You already have two good dual-core systems that should be more than enough to tide you over until then.

If you read my posts my XMas was based on DX10 cards being reasonably priced (I won't be buying a DX9 card). They aren't in my mind for addition to a whole system.

Vista X64 Ultimate won't be available until Feb01/07 so unless I get Vista Business now I won't be buying anyway. I would be buying this for more multithreaded games - 107% min is worth my money - but moreso for business apps that are aleady multithreaed. Doom4/Quake5 WILL be multi-threaded and te Valave tests show my investment will have a return - regardless of C2D or C2Q..
December 3, 2006 2:06:26 AM

As a former owner of an Opteron 270 (2 x dual-core at 2.0 GHz per core, 1 MB L2 cache per core, ccNUMA using 2 x nodes of Dual-Channel Reg ECC DDR1-400).

The Opteron 200/800 series did not support Cool'n'Quiet as Registered DIMMs need a constant flow of energy to work and the memory controller is integrated (has pros and cons).

Since these new CPUs likely support Cool'n'Quiet, I'd recommend just disabling C'n'Q to get the stuttering to stop.

In addition to this add /usepmtimer to C:\boot.ini
http://search.microsoft.com/results.aspx?q=boot.ini+%2F...

ccNUMA really wasn't designed for games, it will likely stutter a fair bit if it is used and /usepmtimer is not used at the same time.

You can run ccNUMA enabled or disabled in both Windows XP, XP x64 Edition, 2003 Server (x64), and Vista (and Linux, etc too). It depends what gives you the best performance.

"Node Interleaving" is disabling ccNUMA, so instead of having the OS control it the 'chipset' controls it, it will stutter less but the performance of the memory will not aggregate as well compared to ccNUMA. (Latency vs Throughput arguement)



The ccNUMA vs Node Interleaving being for one OS and not the others is total garbage. All Windows Kernels based on NT 5.1 will support it, just games may not scale well (stuttering) if ccNUMA is used.

Vista will not improve ccNUMA performance in gaming, the technology is years old, well established, and can not be improved very much at all (from a inter-latency perspective within an OS's Kernel).

Just google "NUMA" and "ccNUMA", it is an idea perhaps a decade old now.

http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access



Quote:

VALVE'S SMP TESTS SHOW GREATER THAN 100% SCALING!!


Quote:

If a core accesses its local memory then it has a bandwidth of 12.8GB/s(using DDR2-800). If it accesses the memory of another node, then this transfer has to take place across the HT2.0 link which is only 4GB/s (8GB/s aggregate). So as well as the added latency associated with the node hop, the transfer would take LONGER than it would from local memory-this is why the HT link bandwidth is important.


Bear in mind one sockets cores can benefit from the L1/L2 cache in the other sockets cores, and then if required, it will access its memory.

ccNUMA is designed to aggregate throughput, not combat latency. For a ccNUMA gaming system /usepmtimer must be added to C:\boot.ini - I can not stress this enough - Perhaps some other IT websites need to redo their tests ?


For anyone considering Quad-FX I strongly recommend they look to Opteron 2000 series instead.


PS: Sorry - This post is a bit all over the place, but meh, it'll do.
a b à CPUs
December 3, 2006 2:38:05 AM

Quote:
Well, the QFX has been released and while certain scenarios show incredible promise, certain areas are also saddled with too much baggage.

The fact that the Opteron dual can be outfitted with SLI in a wksta and offer good perf without these power levels implies that the total package could have been done better.

The Opteron 285 runs at 2.6GHz and this graph shows that without the additional SLI power AMD runs at 322W and the dual 5160 runs at 267W (full load).

Looking at the varous articles around teh web, it seems as though only Anand managed to actually find suitable tasks that were reasonable for multi-tasking. In his case he used BluRay movies which totally killed all the dual core systems.

His power numbers were also at least 100W lower than other test sites. He turned on CnQ and got the idle temps down to within 4W of the C2Q system. Of course this didn't dent the 456W the system drew at "full load" with an 8800GTX ( which I believe draws 225W+ ).

And these are FX74 numbers. FX70 is shown to use even less so I believe OEMs can get reasonable wksta power levels out of it in teh next few months, especially if AMD releases a new rev( they sorely need to drop power by at least 10%- perhaps more and the lessons learned can help get Agena down below the reported 125W)

Hexus is also reporting that they can show a defect in the NUMA implementation of the Asus board BIOS ( this post was going to be called "Did Asus and nVidia drop the QFX ball") that maybe why games are suffering so much from latency problems.

One review ( most are posted at AMDZone) stated that AMD is reporting that the Interleave mode will need to be turned off for Vista and on for XP.

I believe AMD reported that they would release their own branded chipset and hopefully it will be less power hungry than the 680a, which is reported to use more power than even 975X. Having two of them surely doesn't help. nVidia does have a two socket SLI hipset in the 3600 and ASUS' implementation is only $300. Even the $400 Asus 680i for Intel implements less PCIe for less power reserves.

Because 7950GT and the probably forthcoming 8950GT only require two slots for Quad SLI. I can see the need for 4 low end GPUs for certain content creators but even 3 PCIe slots can't really be used right now as no "Havok" type apps or cards have been released, except for the server (AMD Stream).

Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

But the real judgement is that only expert builders will make QFX something not too loud or hot, while Vista X64 may do wonders for it in multithreaded apps so it is not yet ready for prime time.

Let's go AMD! Show your true potential.


so baron, how do you explain the comparison - double the cost, double the heat, custom boards, double the power consumption for a mere -20% (yes, less) performance over Intels Q6700, running at lower frequencies, drop in compatible with current 965/975 mobo's? and the fact that intels off die memory controller and old FSB design can take on AMDs latest? go figure.

I wonder if any other manafactuere is game enough other then ASUS to make such a board for it...
December 3, 2006 2:49:26 AM

Quote:
As a former owner of an Opteron 270 (2 x dual-core at 2.0 GHz per core, 1 MB L2 cache per core, ccNUMA using 2 x nodes of Dual-Channel Reg ECC DDR1-400).

The Opteron 200/800 series did not support Cool'n'Quiet as Registered DIMMs need a constant flow of energy to work and the memory controller is integrated (has pros and cons).

Since these new CPUs likely support Cool'n'Quiet, I'd recommend just disabling C'n'Q to get the stuttering to stop.

In addition to this add /usepmtimer to C:\boot.ini
http://search.microsoft.com/results.aspx?q=boot.ini+%2F...

ccNUMA really wasn't designed for games, it will likely stutter a fair bit if it is used and /usepmtimer is not used at the same time.

You can run ccNUMA enabled or disabled in both Windows XP, XP x64 Edition, 2003 Server (x64), and Vista (and Linux, etc too). It depends what gives you the best performance.

"Node Interleaving" is disabling ccNUMA, so instead of having the OS control it the 'chipset' controls it, it will stutter less but the performance of the memory will not aggregate as well compared to ccNUMA. (Latency vs Throughput arguement)



The ccNUMA vs Node Interleaving being for one OS and not the others is total garbage. All Windows Kernels based on NT 5.1 will support it, just games may not scale well (stuttering) if ccNUMA is used.

Vista will not improve ccNUMA performance in gaming, the technology is years old, well established, and can not be improved very much at all (from a inter-latency perspective within an OS's Kernel).

Just google "NUMA" and "ccNUMA", it is an idea perhaps a decade old now.

http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access




VALVE'S SMP TESTS SHOW GREATER THAN 100% SCALING!!


Quote:

If a core accesses its local memory then it has a bandwidth of 12.8GB/s(using DDR2-800). If it accesses the memory of another node, then this transfer has to take place across the HT2.0 link which is only 4GB/s (8GB/s aggregate). So as well as the added latency associated with the node hop, the transfer would take LONGER than it would from local memory-this is why the HT link bandwidth is important.


Bear in mind one sockets cores can benefit from the L1/L2 cache in the other sockets cores, and then if required, it will access its memory.

ccNUMA is designed to aggregate throughput, not combat latency. For a ccNUMA gaming system /usepmtimer must be added to C:\boot.ini - I can not stress this enough - Perhaps some other IT websites need to redo their tests ?


For anyone considering Quad-FX I strongly recommend they look to Opteron 2000 series instead.


PS: Sorry - This post is a bit all over the place, but meh, it'll do.

ccNUMA was designed for high bandwidth apps such that app data is loaded nearest the "process" CPU. When implemented properly it only requires fitting more data into those closest banks.

For example, if an app uses 1.9GB peak, then swapping needs to happen for the process and if data is only swapped back and forth to the same banks, the "fragmentation" won't cause latency proelms with crossCPU accesses.

The fact that you saw the tests with tthe Valve engine means that you realize that QFX is not designed to take adavntage of the current "limited-threaded" game landscape but the future mutli-core optimized landscape. Vista X64, by default, will automatically have optimizations in X64 (similar to KernelGuard) that will boost perf for multi-core systems.

Of courseit will be up to drivers to optimize both transfers and load/swap, it is not an insurmountable task given the general purpose nature of AMD64s instruction set.

HTs tranfer rate is ONLY necessary when the "working set" is greater than the available bank space AND the total RAM usage is greater than system RAM.

Effective optimizations crated SSE and to some extent AMD64. QFX is alive and well though it's suffering from poor optimization. It is almost reminiscent of Core 2s initial hiccups which caused a MANDATORY replacement rev, as irrelevant as that maybe.
December 3, 2006 3:02:54 AM

Quote:
so baron, how do you explain the comparison - double the cost, double the heat, custom boards, double the power consumption for a mere -20% (yes, less) performance over Intels Q6700, running at lower frequencies, drop in compatible with current 965/975 mobo's? and the fact that intels off die memory controller and old FSB design can take on AMDs latest? go figure.

I wonder if any other manafactuere is game enough other then ASUS to make such a board for it...


It's very simple. I'm comparing FX70 w/ an optimized implementationwith my current AMD system and DX10 multi-threaded games.

I have never said that al of a sudden K8 wold faster core for core than C2. It has shown it's great potential with future tech such as BluRay (AnandTech) and with Valve tests.

As far as other manufs making boards, I have already commented that AMD will need to produce a reference platform that improves upon the current implementation to get more OEMs on board(The Inq , I believe went so far as to say that OEMs didn't want to use AM2).

The ripple effect could even cause a shift in the amount of sockets. AMD should have some kind of AgenaFX demo by Mar or so that will show the added potential of the enhanced arch.

I still believe in the concept and I hope that AMD does too so they will put money into getting the power down.
December 3, 2006 3:08:23 AM

Quote:
baron is stumped,ideals meet reality in a fiery blurr.

i wonder about a baron casewhite link,the information is pretty similar,however its just not working for amd yet.how do you compensate for a 27% rise in memory latency?

Amd does great on long winded equations,but its sooo different from the consumer desktop.you cant have hpc computing on a pc,the end result is slower apps processing.

i think numa could be amd's netburst,they should rethink that.you cant bring hpc to desktop and rule,i dont care how many petaflops you have on your record.

if thats what 4x4 is about ,amd just lost till they ditch that notion.
while hpc designers referr to desktop benches as superficial,the whole array of desktop computing application follows this superficial trend.
i doubt a desktop will do the depth of computing that an hpc does for a long time.



What do you mean? I have never expected QFX to overtake the per clock advantage of C2, but I did expect it to narrow the gap between the two, which it does.

QFX is not abotu NUMAs ability to transfer between cores. It's about dual sockets that use the per socket memory efficiently.

Let's not get back into the numbers game of process load/swap. Maybe Jack can enlighten you as to how 90nm 2.6GHz Opterons are running at 55W to allay your fears. Though that's not to say that AMD will try to lower per scket power with an F4 rev.
December 3, 2006 3:30:50 AM

Quote:
Maybe Jack can enlighten you as to how 90nm 2.6GHz Opterons are running at 55W to allay your fears.


Oh I know that one, its called hand picked and undervolted. Moron.
December 3, 2006 3:56:42 AM

Almost all Win32 processes are limited to 2 GB of RAM, the exception is applications designed to take advantage of /3GB and similar switches in BOOT.INI. Games do not fall into this category.

Search for /3GB on www.microsoft.com for examples (There are a few other similar methods used too, but games do not use any of them as games target consumers with 'normal' PCs). 8)

All allocated memory in Win32 / Win.x64 is virtual by nature, and managded by well... the memory manager.

Virtual simply means the physical and virtual addresses do not match up, it may also imply swapping or paging to disk (or to other RAM - no joke).

The problem with ccNUMA is that one process may use 2 nodes, but it may allocate 400 MB to one node and 800 MB to the other.

SANDRA has a test that demonstrates this very well, since the early Opteron days.

As for Vista having "significantly better multi-processor performance" I have yet to see anything that indicates this is even remotely true. In fact most data points to the exact opposite.

The Windows Kernel has not changed much since NT 4.0, NT 5.0 (Windows 2000), NT 5.1 (Windows XP, 2003 Server), etc - They all support 32 CPU cores and beyond NT 5.1 (Win 2K3 Server x64 Kernel) there isn't really all that much to improve.
However in the consumer varients they limit the processor core count to below 32, the Kernel can still do it, it has just been given an artificially low ceiling

Do consider that NT 4.0 was available for DEC Alpha processors (Compaq bought out DEC, then became HP / Compaq), this was around a decade ago. Aswell as SGI MIPS processors - These systems used (cc)NUMA years before AMD was selling it yo you, and it hasn't changed all that much. (There is nothing much left to 'optimize' in the Vista Kernel to get more performance from ccNUMA systems, Vista is mostly a fresh paint job GUI wise).

I think you may have ccNUMA and swapping/paging confused with something else. (In that how it works on a hardware + low level software basis).


As for the /usepmtimer usage:
[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect /usepmtimer

// (Do not wrap the line)


:? I'm not exactly sure what you think ccNUMA is, or how it works, but it doesn't seam to fall in line with the industry accepted definitions and platform descriptions of NUMA and/or ccNUMA systems (in that how they work at a hardware / kernel level). 8O
December 3, 2006 4:29:25 AM

Quote:
That is pretty much right...

Baron, the low power parts AMD produces are actually binning out higher, they are partitioning their best bin, undervolting and down clocking to hit power.

P=C*V*V*F so by down volting and underclocking one can achieve a cubic functionality to reduce power. Hence a 10% reduction in voltage and fequency gives a 30% reduction (approximately, actually 0.9^3 * 100%) in power.


As soon as I get a chance I will find your quotes of 100nm spacing as the reason that they got power down along with your assertion that it would damage yields.
December 3, 2006 4:32:33 AM

:lol:  What a moron. Keep trying and maybe one day you won't be such a dipshit. Enjoy your sh!tty 4x4 system as well.
December 3, 2006 4:32:52 AM

Quote:
Almost all Win32 processes are limited to 2 GB of RAM,


I think X64 has no limit to process size, but I could be mistaken in my desire to upgrade to that vs. Vista X86.
December 3, 2006 4:50:19 AM

Quote:
Almost all Win32 processes are limited to 2 GB of RAM, the exception is applications designed to take advantage of /3GB and similar switches in BOOT.INI. Games do not fall into this category.

Search for /3GB on www.microsoft.com for examples (There are a few other similar methods used too, but games do not use any of them as games target consumers with 'normal' PCs). 8)

All allocated memory in Win32 / Win.x64 is virtual by nature, and managded by well... the memory manager.

Virtual simply means the physical and virtual addresses do not match up, it may also imply swapping or paging to disk (or to other RAM - no joke).

The problem with ccNUMA is that one process may use 2 nodes, but it may allocate 400 MB to one node and 800 MB to the other.

SANDRA has a test that demonstrates this very well, since the early Opteron days.

As for Vista having "significantly better multi-processor performance" I have yet to see anything that indicates this is even remotely true. In fact most data points to the exact opposite.

The Windows Kernel has not changed much since NT 4.0, NT 5.0 (Windows 2000), NT 5.1 (Windows XP, 2003 Server), etc - They all support 32 CPU cores and beyond NT 5.1 (Win 2K3 Server x64 Kernel) there isn't really all that much to improve.
However in the consumer varients they limit the processor core count to below 32, the Kernel can still do it, it has just been given an artificially low ceiling

Do consider that NT 4.0 was available for DEC Alpha processors (Compaq bought out DEC, then became HP / Compaq), this was around a decade ago. Aswell as SGI MIPS processors - These systems used (cc)NUMA years before AMD was selling it yo you, and it hasn't changed all that much. (There is nothing much left to 'optimize' in the Vista Kernel to get more performance from ccNUMA systems, Vista is mostly a fresh paint job GUI wise).

I think you may have ccNUMA and swapping/paging confused with something else. (In that how it works on a hardware + low level software basis).


As for the /usepmtimer usage:
[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect /usepmtimer

// (Do not wrap the line)


:? I'm not exactly sure what you think ccNUMA is, or how it works, but it doesn't seam to fall in line with the industry accepted definitions and platform descriptions of NUMA and/or ccNUMA systems (in that how they work at a hardware / kernel level). 8O


You know it's funny that I really didn't read your whole post - which is moot btw because I tested the first ACPI Alpha on Win2000 - and it was IMMEASURABLY FASTER than any othe chip even close to the same clock speed.

C2Q also uses the same susbsystem but excels because of it's arch/implmentation. My whole point has been load/swap and OS scheduling. I have said numerous times that I am not thinkiing of this mainly for games, though the above 60fps AND severe multi-threading/tasking capabilities make it a worthy upgrade to prep for 8 cores.

The funny thing is that for my needs I could go with a 5200+(2.8GHz = 11x260), 8800GTS SLI, with 8GB RAM, and then Agena.

I want two sockets. It has shown the promise I wanted (EXCEPT IN THE POWER AREA - though FX70 is tamer with careful construction).
December 3, 2006 5:14:46 AM

Quote:
That is pretty much right...

Baron, the low power parts AMD produces are actually binning out higher, they are partitioning their best bin, undervolting and down clocking to hit power.

P=C*V*V*F so by down volting and underclocking one can achieve a cubic functionality to reduce power. Hence a 10% reduction in voltage and fequency gives a 30% reduction (approximately, actually 0.9^3 * 100%) in power.


As soon as I get a chance I will find your quotes of 100nm spacing as the reason that they got power down along with your assertion that it would damage yields.

Baron, I will save you the trouble.

The third part of that equation above, capacitance, is a harder one to manipulate but that is certainly one variable that can happen.

The problem with the C in the equation above is that when you target the poly line width (which is what you are referring to) the yield immediately drops off. In fact, this is how AMD and Intel squeeze out the enthusiast level parts.

They are called bouquet lots, in which case, these are special lots that are tagged throug the line. At key points down the line, certain process steps are intentionally dialed in to a particular value. A value that is great for transistor performance but horrid for actually getting good yield. So, in order to get the total dollar per wafer pass to be the same, they need to charge high high prices for these parts.... I would not be surprised if AMD is losing money on their lower end FX-7x line.

Anyway, back to the physics. Capacitance defined is dQ/dV, but physically is equal to A*k/D where A is area of the parallel charge planes, k is the dielectric constant, and D is the distance between those planes. To get power down (or, for high end parts, to give you more room to scale frequency up), you need to make C smaller, since K is a constant and D affect short channel affects, it is the A you make smaller. Thus the poly lines that define the area of the gate over the channel are made narrower, this is called a critical dimension, and it can be done at the lithography step for patterning the poly gate electrode lines.

Heck the likely do this too to get those powers down, since they are pushing a 2.6 GHz opteron, the volume will be very low and the price they can get will be relatively high....

The point is, it is nothing special like you are so ignorantly trying to say. In fact, you are fairly dimwittedly stupid.

Jack

Careful Jack - you'll make his head explode.
December 3, 2006 5:36:10 AM

Quote:
You know it's funny that I really didn't read your whole post


How unusual. Get back to cleaning toilets and mopping the floors janitor boy.
December 3, 2006 6:25:07 AM

I'm going to go out on a limb here and play devil's advocate. An in house chip set will not fix the problems that loom over AMD's FX processors. The problem lies in with their design. Overall I think the design is just bad. There is not a simple way to say it but that the design is flawed and horrid. Several reasons behind that, the poor latency, the low overhead for over clocking and the ridiculous amount of power that's required to run the 4X4. Honestly, SLI 8800GTX's is bad enough, but take that and add in possibly the Quad platform and your going to be chewing through the amps on the rails faster than Clinton lied about the scandal in the white house. Given the design implemented on this, AMD surely could have done much better. I personally do not know how they could have revised or changed the design, but the more I read about the 4X4 the more I see it as a hashed fusion of the Opteron processors and the implementation of the K8. Honestly, I mean come on, do the guys over at AMD expect us to buy into this crap and expect that this is the ultimate enthusiast system?

The guys really set their sights too high and should have kept on working out the K8L and AM3. To be honest the 4X4 is jut a total waste of their time and money. I'm going to give AMD credit for trying to go in a different direction and try something like this, but the Quad FX is only a real stop gap for them until they can roll out the K8L and 65nm production.

Moving on though, I would have tried to make improvements to the architecture or just start from scratch and return to my roots and go with something different besides a fusion of an aging architecture and re branding a server based processor as the FX series. Right now the price, power consumption, and benchmarks have AMD staring down the barrel of defeat.

(But honestly, I am rooting for AMD and hope they bring their A game and get it going soon. It's understandable that they are struggling with 65nm and don't have as many resources as the boys in blue, but for the underdog who was kicking Intel's tail for 2 years running, they really fricked this 4X4 idea up.)
December 3, 2006 6:40:27 AM

Quote:
..... I tested the first ACPI Alpha on Win2000 - and it was IMMEASURABLY FASTER than any othe chip even close to the same clock speed .....

C2Q also uses the same susbsystem but excels because of it's arch/implmentation. My whole point has been load/swap and OS scheduling. I have said numerous times that I am not thinkiing of this mainly for games, though the above 60fps AND severe multi-threading/tasking capabilities make it a worthy upgrade to prep for 8 cores. .....

The funny thing is that for my needs I could go with a 5200+(2.8GHz = 11x260), 8800GTS SLI, with 8GB RAM, and then Agena.

I want two sockets. It has shown the promise I wanted (EXCEPT IN THE POWER AREA - though FX70 is tamer with careful construction).


Then explain why the Microsoft Windows 2000 on DEC Alpha documentation is always talking about "reducing paging" ?, While you are always talking about increasing paging / swapping ?

http://msdn2.microsoft.com/en-us/library/ms810461.aspx
http://search.microsoft.com/results.aspx?mkt=en-US&setl...

Until you do I'll be clicking [Ignore] when I encounter your posts, to ignore every post you've made within that that (and to save a ****load of scrolling as you do full quotes needlessly, trollish behaviour IMHO).

As for your comment about "I have said numerous times that I am not thinkiing of this mainly for games, though the above 60fps" immediately followed by your other comment "The funny thing is that for my needs I could go with a 5200+(2.8GHz = 11x260), 8800GTS SLI..."

It sounds to me as though you are a gamer, since you're not running Registered, ECC, or FB-DIMMs.


I'd like to thank the admins for adding an [Ignore] button. :trophy: - I've only needed to use it on a small group of people thankfully, mostly to save reading utter tripe and scrolling time due to full quotes being made w/o purpose.

To the others reading this thread providing support, I thank you all. 8)

UPDATE: AMD need a +46.55172 % gain in performance over the 2.8 GHz quad-core solution just to keep pace, and this is assuming the extra delta continues to scale at 107%, which it will not, it'll keep dropping off the higher the clockspeed or IPC gets compared to the system bus.

Perhaps 2 x +21.05854 % gains stacked together, eg: [clockspeed] + [something else] compounded, but they will not get this without a massive redesign of the CPU core, K8L on 65nm is not that redesign.

eg: 4 x 3.4 GHz cores on 65 nm using K8L, vs 4 x 2.8 GHz cores on 90 nm using K8. This will make up for +21.42856 % of the gain.

Where will the other +20.68966 % (compounded) performance come from ?

Even borrowing a few OOO / pre-fetching tricks from the Intel NGMA (Core 2 Duo, Xeon 5100 / 5300, etc) they'll only get another +10 % to +15 % of the 'required' +20.68966 %, leaving them offering about 95% of Intel Core 2 Quad performance, likely at a higher overall price.

It'll take at least 6 - 9 months once 65 nm is ramped up before they'll even be able to offer a hypothetical 3.4 GHz quad-core part too, time in which Intel will just gain more of a lead.



I still want to see a 45/45/10 market share balance between Intel / AMD / Others (including Sun Microsystems, SGI, Via, etc).
December 3, 2006 7:26:39 AM

Quote:
Well, the QFX has been released and while certain scenarios show incredible promise, certain areas are also saddled with too much baggage.

The fact that the Opteron dual can be outfitted with SLI in a wksta and offer good perf without these power levels implies that the total package could have been done better.

The Opteron 285 runs at 2.6GHz and this graph shows that without the additional SLI power AMD runs at 322W and the dual 5160 runs at 267W (full load).

Looking at the varous articles around teh web, it seems as though only Anand managed to actually find suitable tasks that were reasonable for multi-tasking. In his case he used BluRay movies which totally killed all the dual core systems.

Mmmmm, the only way to reduce this latency issue between cores is to do 2 things. 1) Reduce the number of memory reads/xtransfers between cores 2) Increase the speed of the link

I believe Intel has already accomplished this by increasing the cache for each processor. I bet if AMD would come out with a Dual Core processor with 2 or more Megs of L2 cache we would see a big performance boost for the 4X4 System. At least that is it looks like AMD could do to boost the performance of the 4X4 system using the K8 core processor.

Jack, does this ring true to your thinking?

His power numbers were also at least 100W lower than other test sites. He turned on CnQ and got the idle temps down to within 4W of the C2Q system. Of course this didn't dent the 456W the system drew at "full load" with an 8800GTX ( which I believe draws 225W+ ).

And these are FX74 numbers. FX70 is shown to use even less so I believe OEMs can get reasonable wksta power levels out of it in teh next few months, especially if AMD releases a new rev( they sorely need to drop power by at least 10%- perhaps more and the lessons learned can help get Agena down below the reported 125W)

Hexus is also reporting that they can show a defect in the NUMA implementation of the Asus board BIOS ( this post was going to be called "Did Asus and nVidia drop the QFX ball") that maybe why games are suffering so much from latency problems.

One review ( most are posted at AMDZone) stated that AMD is reporting that the Interleave mode will need to be turned off for Vista and on for XP.

I believe AMD reported that they would release their own branded chipset and hopefully it will be less power hungry than the 680a, which is reported to use more power than even 975X. Having two of them surely doesn't help. nVidia does have a two socket SLI hipset in the 3600 and ASUS' implementation is only $300. Even the $400 Asus 680i for Intel implements less PCIe for less power reserves.

Because 7950GT and the probably forthcoming 8950GT only require two slots for Quad SLI. I can see the need for 4 low end GPUs for certain content creators but even 3 PCIe slots can't really be used right now as no "Havok" type apps or cards have been released, except for the server (AMD Stream).

Only time will tell if AMD had planned to create an entire reference system based on an Ati chipset while allowing nVidia to be the launch partner.

But the real judgement is that only expert builders will make QFX something not too loud or hot, while Vista X64 may do wonders for it in multithreaded apps so it is not yet ready for prime time.

Let's go AMD! Show your true potential.

The NUMA support in Vista is supposed to be better than in Windows XP so Vista should give better performance. For NUMA to work well in this two node set up, node to node traffic should be minimized. I know that in multi node systems it can sometimes be an advantage to use node interleave mode, such as in databases. The problem with Quad FX on the hardware level is the HT link. I do not think HT2.0 is adequate and it would take a FULL HT3.0 link to do the job viz., a 2.6GHz link that is 32bits wide. If you look at some of the memory benchmark figures you will see that they are very low and are probably caused by excessive cross-fire.

Bang on correct. 4x4 as a platform will be much better with the socket 1207+ and HT 3.0.
December 3, 2006 7:48:55 AM

Okay, so I'm checking my morning e-mails here and this thread has grown from about 4 posts when I went to sleep to 4 pages when I wake up.

The general gist of this thread (so far as I can tell) is that BM seems to be making the same argument over...and...over...and...over...again.

This whole thing was summed up on page 1!
    • 1 / 11
    • 2
    • 3
    • 4
    • 5
    • More pages
    • Next
    • Newest
!