Sign in with
Sign up | Sign in
Your question

Smithfield: 2Q05, 2.8Ghz for $240!!!

Tags:
Last response: in CPUs
Share
January 20, 2005 11:54:59 AM

We all know that Smithfield is not as technically interesting as Toledo, AMD's alternative that is going to be launched by the 2H05. The obvious speculation leads us to believe that smithfield is probably too hot and too underperforming... But recently, a little more info has been leaked that shows that there might be at least one small advantage for them: Prices.

Smithfield 840 $528
Smithfield 830 $314
Smithfield 820 $240
<A HREF="http://www.theinquirer.net/?article=20826" target="_new">(pricing info from the inquirer)</A>

In addition to that, Smithfield will be launched in <A HREF="http://www.theinquirer.net/?article=20806" target="_new">late 2Q05</A>, which probably means it might get launched before toledo. (I wonder if it'll be another paper launch?... I also wonder if Intel could flood the market with 820s... and make it all sound as if it was their idea, not AMD's, and they haven't just slapped two cores together...)

This will mean a lot if, as we're told, AMD only gets dual-core Toledo Athlon FX processors with FX price tags (500+). I mean, a dual-core processor for $240 by july (if that schedule is to be believed in) might just make for an interesting product. The only problem is that you'll probably have to buy one hell of a powerful heatsink and fan for the processor.... And you'll spend extra money if you don't want it to sound like a jet engine...
January 20, 2005 1:30:24 PM

Do we know any projected prices for the Toledo though?? How do we know that AMD won't still undercut the Smithfield??
Related resources
January 20, 2005 2:21:03 PM

I don't know of any projected prices for toledo, but it's slated to be introduced as Opteron CPUs first and FXs first, which basically means $500+ prices. This is of course only my speculation. :wink:

In any case, I think Intel's production capacity, even if it is for a sub-par product technically, is always traditionally better than AMD's, which means they actually could price their products lower than AMD's if they wanted to.... in theory.
January 20, 2005 2:28:36 PM

Because AMD can't take the ASP hit in both the flash market and CPU market. I have no idea how Intel's going to make those magic price numbers for dual core chips - very interesting indeed.

I'm just your average habitual smiler =D
January 20, 2005 3:10:15 PM

Quote:
I have no idea how Intel's going to make those magic price numbers for dual core chips - very interesting indeed.

Exactly. US$240 is quite buyable, FX prices and Opterons are not...
January 25, 2005 10:25:24 PM

I've not been keeping up, so please clue me in here. Smithfield is a dual core prescott ? If so, *yawn* @2.8 GHz it will be like an SMP Celeron: pretty good at precious few things. By that time $240 will buy you a cpu that is considerably faster on 98% of the apps out there. oh wait, $250 already buys me that today.

Wake me up when we get higher clocked 64 bit enabled Dothans. Dual core is nice, but no substitute for single threaded performance unless you run a server or workstation app that can take advantage of it.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 25, 2005 10:39:22 PM

It would be interesting ot see if current Socket T mother boards will get bio's to support Dual core processors. If so the upgradability of intel would trounce that of AMD.
January 25, 2005 11:10:29 PM

Don't kid yourself - Intel will make more money with a new chipset requirement.

<b><i>Powered by <font color=blue>V</font color=blue><font color=purple>E</font color=purple><font color=red>R</font color=red><font color=purple>T</font color=purple><font color=blue>O</font color=blue></b>
Fueled by <b><font color=blue>CL-</font color=blue><font color=red>ONE</font color=red></b>
January 26, 2005 12:20:33 AM

They have already said that dual core will require a lot more power. No socktT made today can take thay kind of abuse.
January 26, 2005 7:08:20 AM

A dual core presscott? That make 140 million logic transistor that 5 time larger that a NW core.The yield of this chip will be very very bad and at this price Gross profit margin will be close to 0%.

I personnaly believe it a Nw core that consume less and will performe better intel did got 2 year to rework it or miracle have happen with the prescott.

i need to change useur name.
a b à CPUs
January 26, 2005 11:01:55 AM

Welcome Back

I aint signing nothing!!!
January 26, 2005 11:58:18 AM

I still havent seen anything indicating that people will get a performance boost in programs that are not programmed to take advantage of multiple cores. In games dual core will make [-peep-] all difference.

Quote:
Dual core CPU might use BTX which is better in heat dissapation than ATX

BTX is layed out so as too better circulate air over the hottest components. You are still going to need a HUGE heatsink and fan to get rid of the heat of a dual core prescott! A single core prescott can dissipate up to 102w of heat! Think about a dual core one, my waterblock can only get rid of 200w!
You seen to make out that BTX is the miricle cure for all our heat problems.
January 26, 2005 12:06:36 PM

Well, it is more or less safe to assume that prescott's dark transistors were causing all of the heat problems. Remember, prescott has over 2x the logic transistors that NW had... If Intel got rid of the garbage and got the core working more smoothly, then maybe smithfield will work as a product...
January 26, 2005 12:14:27 PM

Rest assure they are both prescott's and also rest assure it will not be as bad as everyone thinks.

All the thermal control IC's from P-M and Itanium series(s) will be in their, on the fly clock speed adjustment, IC wait,stall,hybernate states.

As for software performance it will be on par with...

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
January 26, 2005 4:31:49 PM

I'm extremely skeptical of this chip's performance, especially the 820 and 830.

For one, a single Scotty core eats bandwidth like no other, imagine two. Second, the only type of RAM, DDR2, that could provide this bandwidth is expensive and has horrible latency at high speeds (newer modules can run 3-2-2-8 at 266MHz, which isn't shabby, but still isn't enough bandwidth and still costs a fortune). Sure, Scotty doesn't care THAT much about latency, but I have a feeling that the way the arbiter unit on Smithfield will work will be in a way that that won't effectively cache (as it doesn't have any of its own cache) requests and will therefore NEED extremely low NB and mem latencies (not just timings, but true latency, something Intel's off-die mem controller <b>cannot</b> provide) to handle the requests of both cores. DDR2 does not have these required timings to even begin to make up for the lack of an ondie mem controller, either.

So what <i>could</i> that mean? Slower performance than a single core at the same clock (maybe that's why HT was disabled...) by a small (likely very small) amount on single-threaded apps. It will also mean poor multi-threaded scaling as opposed to what AMD could do. Not only that, but the clockspeeds of these chips are puny by modern Intel (somewhat required) standards. The power dissipated is also astronomical at all three clockspeeds and will likely only get worse since these per-core-power-dissipation numbers are among Intel's lowest thought possible (55-60W is low for a Scotty).

Intel, IMO, NEEDS to make Yonah soon. Dothan doesn't use all the bandwidth provided by dual-channel DDR2 and doesn't seem to mind the higher latencies of DDR2. Not only that, but the arbiter unit won't have to work as hard due to the lower clocks per core and won't be tied down by bandwidth arbitration. ON TOP OF THAT, Dothan's performance is as great as the best P4, runs at an extremely low 27W of power and has the overhead (in terms of power) available for 64-bit, NX, etc.

Looks like another ploy by Intel to exercise their marketting department's dominance of AMD's. But for people looking for a processor, AMD's dual core solution (although later and likely more expensive) was built from the ground up for dual core and will really be the only solution.

Maxtor disgraces the six letters that make Matrox.
January 26, 2005 6:28:42 PM

>For one, a single Scotty core eats bandwidth like no other,
>imagine two

Well, it won't be worse than Xeon; further more, if the second core can't be used (because the software isn't multithreaded), that core wn't need a lot of bandwith either :D  If the core can be used, its gonna help regardless of bandwith, just don't expect 2x performance.

> Second, the only type of RAM, DDR2, that could provide
>this bandwidth is expensive and has horrible latency at
>high speeds

Again, won't be worse than either Xeon DP/800 or a single P4 on single threaded apps.

>Dothan's performance is as great as the best P4, runs at an
>extremely low 27W of power and has the overhead (in terms
>of power) available for 64-bit, NX, etc.

Yeah, we would all want a 20W, 2.8 GHz dual core Dothan, with AMD64, NX, integrated memory controller, HTT, FB-DIMM, Itaniums FPU's and Power5's FSB.. and ..maybe toss in VIA's embedded encryption engine as well :)  Oh, and for no more than $200 :D  Did I mention 9Mb L3 yet ?

Its not like intel had a lot of options, 64 bit just isnt ready for Dothan yet, and a high end, dual core chip that does not support that is going to be a hard sell once windows X64 ships

In all fairness though, this would make a great chip for a cheap workstation. its just not gonna be a gamer chip, thats for sure.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 26, 2005 6:31:59 PM

>on the fly clock speed adjustment, IC wait,stall,hybernate
>states.

All of that is great... when you are not feeding the core with instructions ! When you need the power, none of this is going to help squat.

>As for software performance it will be on par with...
>Xeon

Agreed. But it will be harder to cool since it will be a fairly impressive ammount of power in a tiny package...

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 26, 2005 11:15:25 PM

Quote:
For one, a single Scotty core eats bandwidth like no other, imagine two.

The Prescott’s core logic bundles data even if it's not double precision data, bandwidth issues will be moot with considerations P4's run on a quad data rate memory subsystem, CPU1 takes rise fall of the 1/2 tick CPU2 gets the rise fall of the second 1/2 of system tick. In theory if both processors keep normal busy, ei: game there should be no real world performance loss.

Now with regards to heavy memory access software such as encoding or synthetic benchmarks is where the restrictions will become quite apparent. But Intel has shown they do very well with threaded vectorized code thus far, with known information it's likely not to change.

Quote:
It will also mean poor multi-threaded scaling as opposed to what AMD could do.

I beg to differ Intel has put up a very good fight IMO with even the Xeon's bandwidth starved, IPC crippled, and lack there of a on die memory controller. With Intel's current HT technologies showing good performance with good code, I would have to say that Intel should in theory be on par or slightly ahead of AMD in threaded applications. Until we see NVIDIA’s Nforce5 chipset which will undoubtedly breath life into the performance starved Intel camp, it's anyone’s guess how/if memory will be an issue with smithy.

Quote:
The power dissipated is also astronomical at all three clockspeeds and will likely only get worse since these per-core-power-dissipation numbers are among Intel's lowest thought possible (55-60W is low for a Scotty).

Final thermal output and electrical draw have yet to be "finalized". I know Intel stated worse case scenario draw and heat output but the race isn’t over yet.

Quote:
Looks like another ploy by Intel to exercise their marketting department's dominance of AMD's. But for people looking for a processor, AMD's dual core solution (although later and likely more expensive) was built from the ground up for dual core and will really be the only solution.

Again without real world performance numbers there is no winner or loser at this point. AMD may be faced with production issues long before they ready the release of the FX and A64 lines for dual core operation. They have trouble meeting market demands as it is, doubling processors per die without double production capacity will equal market shortages. Sure they will be fast as snot, but no one will have those, makes it very moot in the end.

Quote:
All of that is great... when you are not feeding the core with instructions ! When you need the power, none of this is going to help squat.
Quote:

If code worked in that manner yes but it is sequenced and ordered, 100% CPU usage isn’t a real world situation unless you have some sort of a virus or the code ends up in a infinite loop. Now will there be more "power" in a smithy vs. K8 not likely, the K8 will enjoy its IPC dominance yet another generation.

Quote:
Agreed. But it will be harder to cool since it will be a fairly impressive ammount of power in a tiny package...

I don’t see that being a real issue either double the silicon on a 200 watt spec heat sink just means the heat sink is operating at its max specifications. As for power draw ya it’ll be pretty sick but the race isn’t over yet. Intel might pull another rabbit out of the hat similar to the "j" series.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
January 27, 2005 12:18:11 AM

I think you assume too much. I mean, I understand your concerns, and while they do seem to apply from what we know at this point, we cannot presume to foresee the future.
Quote:

For one, a single Scotty core eats bandwidth like no other, imagine two. Second, the only type of RAM, DDR2, that could provide this bandwidth is expensive and has horrible latency at high speeds (newer modules can run 3-2-2-8 at 266MHz, which isn't shabby, but still isn't enough bandwidth and still costs a fortune). Sure, Scotty doesn't care THAT much about latency, but I have <b>a feeling </b>that the way the arbiter unit on Smithfield will work will be in a way that that won't effectively cache

While I see your point, I don't think smithfield will be tremendously bandwidth starved - like someone else here already said, it can't be worse than current Xeon platforms. And if the arbiter chip is well designed, Intel might manage. And have you ever thought that AMD will also share only one memory controller, as sophisticated as that controller might be? This <i>will</i> also have an impact - although probably much smaller - on single-core performance... perhaps comparable to going back from s939 to s754 processors.

Also, I find it funny that you're basing your predictions off a <b>feeling</b> you've got... I mean, what were I to predict if I did the same thing and was feeling constipated? Oh boy :wink:
January 27, 2005 12:36:27 AM

Quote:

>on the fly clock speed adjustment, IC wait,stall,hybernate
>states.

All of that is great... when you are not feeding the core with instructions ! When you need the power, none of this is going to help squat.

Ahhh yes, I was almost getting afraid that noone else noticed that. I don't care about idle temps, I care about 100% CPU usage temps!!! Thanks for reminding me of that point.

The only scenario I could think of is if you're watching a DVD and you want your CPU fan to slow down for less or no noise, but that's about it. And also, if at all possible, it would be best if the damned system was cool and quiet<i>all the time</i>, even when being stressed out!
Quote:
>As for software performance it will be on par with...
>Xeon

Agreed. But it will be harder to cool since it will be a fairly impressive ammount of power in a tiny package...

Hm, yes, but that already tells us some things... I mean, a dual-core A64 will probably not be faster than a similar-clocked 2-way Opteron system. But currently, Opteron is quite superior to Xeon, which leads us to believe that, barring minor fluctuations, the same pattern will more or less repeat itself. So what we actually needed to be doing is trying to list comprehensively what is going to be different. So what sets dual-core apart from traditional dual-cpu workstation setups? What will change?

- Smithfield will support Dual DDR2-667, not DDR2-400 like Xeon. (could be a big difference, who knows)
- Intel's current dual-cpu NBs will be replaced by the arbiter chip. Changes to performance due to this change are <b>unknown at best,</b> but will probably not exceed a few percentage points from standard Nocona setup.

There are many other factors. Firstly, both Intel and AMD will put updated versions of their cores, and those might feature better performance or more features (read: SSE3 for Toledo). But in any case, Intel has the lower hand now and it won't be easy. What I can't quite grasp is why they don't introduce smithfield with 1066Mhz FSB... thermals, perhaps?...

Anyway, anyhow, let them fight it out and let us have the best of both. :smile:
January 27, 2005 1:48:53 AM

Quote:
Until we see NVIDIA’s Nforce5 chipset which will undoubtedly breath life into the performance starved Intel camp, it's anyone’s guess how/if memory will be an issue with smithy.

You're quite right... Personally, I'm hoping nForce5 can actually make a true difference - maybe even before i945 and i955 show up. I wonder if Nforce5 supports Smithfield already?... It does support dual-channel DDR2-667, at least, which is probably the turning point for DDR2 memory... hopefully, at not too high a price.
January 27, 2005 9:23:23 AM

>The Prescott’s core logic bundles data even if it's not
>double precision data

Sounds like you have no idea what you are brabbeling here. What does "logic data" have to do with "double precission" ? Exactly: nothing.

> bandwidth issues will be moot with considerations P4's run
>on a quad data rate memory subsystem, CPU1 takes rise fall
>of the 1/2 tick CPU2 gets the rise fall of the second 1/2
>of system tick.

LOL. If that where true, then there would be no difference between a quad pumped FSB (4x200) and a double pumped FSB (2x200) on a single P4. Well, I suggest you think again.

>ame there should be no real world performance loss

Here I agree, there won't be any significant loss over a single core cpu, just because an idle core doesnt require or consume any bandwith. And when it does (implying threaded code, or severall active threads), a second core will help getting the best execution rate for the available bandwith. The only cave at I see is cache trashing, I don't know how Smithfields cache will look like (unified L2 or not ?). IF it is unified, some code might suffer pretty bad, just like it did on early HT implementations for the same reason. But HT as been around long enough for compilers to be aware of this, so its probably not going to be a major issue.

> Until we see NVIDIA’s Nforce5 chipset which will >undoubtedly breath life into the performance starved Intel >camp,

What is special about nForce5 that you expect a significant performance gain from it ? AFAICT, intel is second to none when it comes to desinging memory controllers. I just don't think there is a lot of room for improvement there, let alone nVidia would be able to do it. nForce1 & 2 only rocked so much because every other K7 controller sucked so badly.

>I don’t see that being a real issue either double the
>silicon on a 200 watt spec heat sink just means the heat
>sink is operating at its max specifications

Oh common. You will probably also claim Prescot has no heat issue, and does not require huge noisy fans.. Yes it can be done, but that doesn't mean such high wattages, especially the way intel specs their cpu's, are desirable. Further more, I highly suspect the chip will implement the powermanagement features to keep it from exceeding the spec, which simply means throtteling. Dual core Prescott just doesn't sound like a good idea to me, at least not for a consumer product (might be a nice low end workstation chip though)

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 27, 2005 9:42:20 AM

>The only scenario I could think of is if you're watching a
>DVD and you want your CPU fan to slow down for less or no
>noise, but that's about it

Actually, 99% of the time at least one core will be iddle, so having a Cool&Quiet/Speedstep whatever equivalent certainly is a good thing. It just doesn't help reducing HSF or PSU requirements, since you also want the chip to work at 2x100% load :) 

>Hm, yes, but that already tells us some things... I mean, a
>dual-core A64 will probably not be faster than a similar-
>clocked 2-way Opteron system

Who's to say ? AFAICT, AMD is not power limited at all, so even adding a core might enable them the same clockspeeds. AMD chips are clocklimited for other reasons. The same can not be said from intel, they seem very much power limited, so doubling the cores will hurt their clock speed (on P4 obviously).

Consider Dothan; adding a second core to it, is not likely to have much impact on maximum clock frequency. Well, of course statistics come it, if you measure the average maximum overclock of 2 chips at a time, its likely to be less than for one, but not by much.

>- Smithfield will support Dual DDR2-667, not DDR2-400 like
>Xeon. (could be a big difference, who knows)

Are current xeons *still* single channel ? If so, expect a very nice boost over Xeon DP.

>- Intel's current dual-cpu NBs will be replaced by the
>arbiter chip.

Ahem.. no. From what I read, first smithfields will not feature an arbiter chip, hence presenting themselves as 2 distinct cpu's to the NB. That is exactly the same configuration as Xeon DP. Now, adding an arbiter (which they will later), might *speed up* performance over a non arbiter or DP configuration, by making more efficient use of the FSB bandwith. By how much is hard to tell, but I wouldn't expect it to be huge. Personally, I think a bigger downside is that initial Smithfields will be seen as 2 *physical* cpu's by the OS and software, meaning, more expensive licences. HT enabled chips are mostly seen as just 1 cpu, but without arbiter chip, there is no way to differentiate a dual core from a dual cpu setup.

> both Intel and AMD will put updated versions of their
>cores, and those might feature better performance or more
>features (read: SSE3 for Toledo)

I'm sure that will make a whopping 0.5% difference :) 

>What I can't quite grasp is why they don't introduce
>smithfield with 1066Mhz FSB... thermals, perhaps?..

Not likely. Signal integrity on the MB seems like a better bet. I mean, why not release 1600 or 3200 MHz FSBs while you are at it ? :) 

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 29, 2005 5:17:06 PM

Quote:
Sounds like you have no idea what you are brabbeling here. What does "logic data" have to do with "double precission" ? Exactly: nothing.

Vectorized code such as SSE and SSE2 run-in bundles 128bit, Intel added and widened this feature for the Prescott core, where the processor tries to alleviate stress off the memory subsystem by bursting the data whether it be to a GPU, IDE controller or memory. Ill get you a link since you obviously missed that tidbit of information.

Quote:
LOL. If that where true, then there would be no difference between a quad pumped FSB (4x200) and a double pumped FSB (2x200) on a single P4. Well, I suggest you think again.

Did I miss something? A P4 receives and sends 4x a clock tick so no 2x200 would not be the same as 4x200, to be sure I am amazed by what you said because it makes no sense what so ever.

Quote:
But HT as been around long enough for compilers to be aware of this, so its probably not going to be a major issue.

Yes most of the ground work for this processor has been done I would have to think it would be a fairly smooth transition with regards to compiler and OS support. As far as I can tell all existing HT optimized code will work “ideally” on smithy.

Quote:
What is special about nForce5 that you expect a significant performance gain from it ? AFAICT, intel is second to none when it comes to desinging memory controllers. I just don't think there is a lot of room for improvement there, let alone nVidia would be able to do it. nForce1 & 2 only rocked so much because every other K7 controller sucked so badly.

Yes then I come back to the N-Force 4 chipset, does good for AMD. N-Force 5 should be just as interesting I would think.

Quote:
Oh common. You will probably also claim Prescot has no heat issue, and does not require huge noisy fans.. Yes it can be done, but that doesn't mean such high wattages, especially the way intel specs their cpu's, are desirable. Further more, I highly suspect the chip will implement the powermanagement features to keep it from exceeding the spec, which simply means throtteling. Dual core Prescott just doesn't sound like a good idea to me, at least not for a consumer product (might be a nice low end workstation chip though)

I never said the Prescott didn’t have design related thermal issues in fact I never mentioned anything in relation to that, last I recall I was talking about heat sinks spec’d for 200watts thermal output. But whatever you like to hear yourself talk.

Quote:
Actually, 99% of the time at least one core will be iddle, so having a Cool&Quiet/Speedstep whatever equivalent certainly is a good thing. It just doesn't help reducing HSF or PSU requirements, since you also want the chip to work at 2x100% load :) 

So why did I pay for XP if 99% of the time my second core will remain idle, It doesn’t work that way with HT enabled P4’s goes back and forth. I don’t see it changing with smithy.

Quote:
Who's to say ? AFAICT, AMD is not power limited at all, so even adding a core might enable them the same clockspeeds. AMD chips are clocklimited for other reasons. The same can not be said from intel, they seem very much power limited, so doubling the cores will hurt their clock speed (on P4 obviously).

How does power even support a argument that a dual core K8 would be about the same speed as a 2x Opteron. Also we have all read the clock speeds for dual cores so what further points are you trying to make.

Quote:
Are current xeons *still* single channel ? If so, expect a very nice boost over Xeon DP.

Been dual channel for quite some time.

Quote:
Ahem.. no. From what I read, first smithfields will not feature an arbiter chip, hence presenting themselves as 2 distinct cpu's to the NB. That is exactly the same configuration as Xeon DP. Now, adding an arbiter (which they will later), might *speed up* performance over a non arbiter or DP configuration, by making more efficient use of the FSB bandwith. By how much is hard to tell, but I wouldn't expect it to be huge. Personally, I think a bigger downside is that initial Smithfields will be seen as 2 *physical* cpu's by the OS and software, meaning, more expensive licences. HT enabled chips are mostly seen as just 1 cpu, but without arbiter chip, there is no way to differentiate a dual core from a dual cpu setup.

What the hell are you talking 1st gen smithies are equipped with the arbiter chip, the arbiter chip controls bus transactions, current Northbridge’s are not equipped to deal with 2 chips on one bus. With perhaps exception with the 925, 925XE respectively, hence in 2006 they will be introducing dual next gen Northbridge’s configurations, and please notice these will not be the current 915 and 925 chipsets but the 945, 955 respectively, which will boast serial ata2, 667 DDR2, 1066 fsb and some more wireless jazz.

But bandwidth isn’t the real concern for those processors at this point, thermal output has to be managed.

As per the licensing issues for dual core systems, I honestly don’t know if MS has set something in stone for that I have yet to read it. I would have to assume regardless that MS will treat it like it did with HT, since there is going to be upwards of 7 different versions of longhorn.

Quote:
I'm sure that will make a whopping 0.5% difference :) 

Personally 3-7% is more realistic of a theory, with regards that now there are 2 independent SSE/SSE2/SSE3 engines if given fast shifting with minimal latency from CPU to CPU should prove to be most beneficial.

Quote:
Not likely. Signal integrity on the MB seems like a better bet. I mean, why not release 1600 or 3200 MHz FSBs while you are at it ? :) 

Last I checked the 945, 955 chipsets would support 1066, I would have to assume with the C0 stepping of the ICH6 rolling out that it was just easier for them to roll that out see how it works and move along to the ICH7 which would have to be released with the next generation chipset. All respective timeframes work out in the end for Intel.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
January 29, 2005 7:39:07 PM

A mostly overlooked asspect about intel not having an arbitrator and AMD having an Arbitrator + an ON-DIE memory controloer is the folowing :

Cahche Snooping, an operation where CPU0 looks in CPU1 cache to snoop for data it needs, is going to be less then half the latncy and twice the bandwidth with the AMD Platfrom. sence the path is going to be all on-die and at much higher speed throughput.

AMD dualcore is going to be much more effective then Intels.

A DP Pentium 4 2.8ghz W/O HT is NOT an attractive offer for 240$. when compered to a 240$ Athlon 64 3800+ (given by then it drops to todays 3500+ price level).
The Athlon 64 gets you between 20 to 35% more preformance at HL2. you can probably applay smaller preformance gains to most other single threaded applications.
I suspect that in most threaded applications the Pentium 4 will do only a litle better then catching up with A64. remeber that todays 2.8ghz prescott alrady enjoy ~10% boost in threaded applications due to its HT. even at good thereaded worksation-type (not server) workloads a dual-xeon setup wont get much better then 30% advantage over a single hyperthreaded Pentium 4. so thats what were looking at here... a slight victory for the Pentium 4 in a small portion of threaded Apps. and thats for a 300+(!) square mm CPU against an 84mm CPU.
not mentioning heat and power consumption.

BUT - AND THATS A BIG ONE much more importently for AMD - people who are not Technology enthusiests will LOVE the idea of getting a dual 2.8GHZ CPU for a that cheap. becouse the obvious reasoning is 2.8X2 is 4.6 and no one puts a up a 4.6 number or even close to it for 240$ (or any amount of money for that matter).

Thats the real problem... and thats what AMD strategy will be focused on - education. AMD, as always puts its faith in the costumer amount of IQ and awarness.

at worst - the best AMD dffensive option is to put up cheap dual-core winchesters at around 160 square mm they are still going to be reasnoble at 240$ and even at low speed bins (1.8 Ghz) they will eat smithfiled for breakfest... only problem is that if this gets too popular AMD might, again, get into a slight manufactoring problems. (Small becouse 95% of the CPUs they sell would still be about 20%-40% smaller then last year).

I hope the english is readable becouse there is no way im spell cheking it..


This post is best viewed with common sense enabled<P ID="edit"><FONT SIZE=-1><EM>Edited by iiB on 01/29/05 11:43 PM.</EM></FONT></P>
January 29, 2005 10:52:50 PM

Quote:
Cahche Snooping, an operation where CPU0 looks in CPU1 cache to snoop for data it needs, is going to be less then half the latncy and twice the bandwidth with the AMD Platfrom. sence the path is going to be all on-die and at much higher speed throughput.

The arbiter will keep track of that, as well the latencies and bandwidths between cores wont be a issue, they don’t need to move something like 3 gig's between them so your point of AMD having more bandwidth and lower latencies, which brings me to the clear and obvious question. How do you know the latencies, last I checked AMD and Intel have stated nothing in regards to latencies since they wont matter.

Regardless of latencies and bandwidth the information is just a few system ticks away.

Quote:
AMD dualcore is going to be much more effective then Intels.

Got some numbers to back that one up, truth is known AMD's solution will most likely be more efficient but Intel may yet pull a rabbit out of the hat.

Quote:
A DP Pentium 4 2.8ghz W/O HT is NOT an attractive offer for 240$. when compered to a 240$ Athlon 64 3800+ (given by then it drops to todays 3500+ price level).

How so you get 2 processors for the price of one, unless you actually believe that 2 core machines will muster no real world advantages, then by all means believe what you want.

Quote:
The Athlon 64 gets you between 20 to 35% more preformance at HL2.

Point being, as well real world game play is much more different that rail benchmarks. I find it amusing people now dismiss 3dmark as a reliable benchmark what makes the rail benchmark any different.

Quote:
I suspect that in most threaded applications the Pentium 4 will do only a litle better then catching up with A64.

I was unaware the P4 lost to the A64 in threaded benchmarks, as for catching up I don’t know I don’t even know what to say to it just confuses me.

Quote:
and thats for a 300+(!)

It's 215 actually but you seem to be on a rant so 300 it is.

Quote:
Thats the real problem... and thats what AMD strategy will be focused on - education. AMD, as always puts its faith in the costumer amount of IQ and awarness.

Sure they do, ah yes the PR rating deal just screams to me honesty.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
January 30, 2005 12:14:55 AM

>Vectorized code such as SSE and SSE2 run-in bundles 128bit,
>Intel added and widened this feature for the Prescott core,
>where the processor tries to alleviate stress off the
>memory subsystem by bursting the data whether it be to a
>GPU, IDE controller or memory. Ill get you a link since you
>obviously missed that tidbit of information.

Sure, go ahead and link, its far better than your reformulated Intel PDFs. Doesn't change the fact that why you said earlier make *zero* sense. Hint: double precission applies to floating point, not integer.

>Did I miss something? A P4 receives and sends 4x a clock
>tick so no 2x200 would not be the same as 4x200, to be sure
>I am amazed by what you said because it makes no sense what
>so ever.

Yeah you missed something. the stupidity of your original claim.

> As far as I can tell all existing HT optimized code will
>work “ideally” on smithy.

no, SMP optimized code will.

> N-Force 5 should be just as interesting I would think.

Except that intel never designed a memory controller for K7, so its anyone's guess how much headroom there still is. My guess: pretty much none. I doubt nVidia will do a better job than intel, and its a fact there is no historical evidence indicating anything else. There is, however, plenty of historical evidence of intel seriously outperforming any other third party memory controllers (Via, SIS, ATI,..) on the market. What makes you think this is going to be any different ?

>So why did I pay for XP if 99% of the time my second core
>will remain idle,

Cause you care about the 1% time when cpu performance matter (like when playing a game or whatever).

>It doesn’t work that way with HT enabled P4’s goes back and
>forth. I don’t see it changing with smithy.

yeah, back and forth between WHAT ? Going back and forth is only usefull if you have two CPU bound apps or threads to go back and forth between, which is for most people, almost never.

>How does power even support a argument that a dual core K8
>would be about the same speed as a 2x Opteron

Hu ? Fairly simple really. A cpu can be held back because of timing issues or power issues. For AMD, I don't see power being the limiting factor, at least not with a single core, so I can believe dual dual core CPU's would not be a lot slower (clockspeed). For intel, its pretty clear that when you apply exotic cooling, they can clock considerably higher, implying thermal limitations rather than timing or transistor switching speed. Hence, doubling the power isn't likely to have a positive effect on clockspeed. Is that too complicated for you ?

>Been dual channel for quite some time.

If so, scratch the nice performance boost part.

>What the hell are you talking 1st gen smithies are equipped
>with the arbiter chip, the arbiter chip controls bus
>transactions, current Northbridge’s are not equipped to
>deal with 2 chips on one bus.

Among AGTL+ designs specs is glueless 4+ SMP and dual independant busses. if current chipsets do not support SMP, its marketing, nothing else, the bus is perfectly capable of it, as are the chipsets if they are not crippled.

>But bandwidth isn’t the real concern for those processors >at this point, thermal output has to be managed.

Ah, so you do agree now ?

>Personally 3-7% is more realistic of a theory,

Show me one real world app that gains anywhere near 7% using SSE3 compiliation versus SSE2 and I'll be impressed. Show me 50 apps, and I might agree "3-7%" is an average one should expect.

>Last I checked the 945, 955 chipsets would support 1066

Sure they might; but how about the motherboards ? Signal integrity is a motherboard design issue, not a chipset issue. You just reinforced my point, if there is no 1066 fsb option today, its most likely because the motherboards would be too hard/expensive to produce.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 30, 2005 1:50:39 AM

I hate b eing petty. In a discussion like this, attention to detail is important.
Quote:
becouse the obvious reasoning is 2.8X2 is 4.6

The last time I checked 2X2.8 was 5.6. Are you using higher math?
January 30, 2005 10:35:26 PM

Quote:
Sure, go ahead and link, its far better than your reformulated Intel PDFs. Doesn't change the fact that why you said earlier make *zero* sense. Hint: double precission applies to floating point, not integer.

Yes your point is? The statement was clear I just used a valid example since SSE,2,3 already run in bundled, as well MMX bundles, but whatever you argue for the point of arguing.

Quote:
Yeah you missed something. the stupidity of your original claim.

What sends receives 4x a clock tick? Oh well not my problem you can't understand what I am saying, like I care for that matter.

Quote:
no, SMP optimized code will.

Is that not what I said specialized HT code works quite well on dual processor machines?

Quote:
What makes you think this is going to be any different ?

Wow your pretty stupid VIA was smoking Intel based chipsets, that was about the time when VIA didn’t have a "license" for the P4 FSB. Intel dealt with it quite well by scaring motherboard manufactures and OEM's into not using it. Ask Crashman I know he remembers.

Quote:
Cause you care about the 1% time when cpu performance matter (like when playing a game or whatever).

HMMMMMM K!

Quote:
yeah, back and forth between WHAT ?

Registers...

Quote:
Going back and forth is only usefull if you have two CPU bound apps or threads to go back and forth between, which is for most people, almost never.

So then what does Windows do? Obviously can't balance loads, manage threads, or do memory management.

Quote:
Hu ? Fairly simple really. A cpu can be held back because of timing issues or power issues. For AMD, I don't see power being the limiting factor, at least not with a single core, so I can believe dual dual core CPU's would not be a lot slower (clockspeed). For intel, its pretty clear that when you apply exotic cooling, they can clock considerably higher, implying thermal limitations rather than timing or transistor switching speed. Hence, doubling the power isn't likely to have a positive effect on clockspeed. Is that too complicated for you ?

Do you even know what you are talking about? Clock speed limits sure they come from voltage limitations which pose transistor switching limitations. But in the end all it is, is signal integrity due to parts of the CPU switching faster then the rest of the CPU.

But as for a dual K8 vs. a dual Opteron I still don’t see where you argument is coming from, if the socket can deliver the correct amperage why would either situation differ? Over clocking would also be marginally different in each situation, since the HT bus is quite capable of handling higher speeds.

Quote:
Among AGTL+ designs specs is glueless 4+ SMP and dual independant busses. if current chipsets do not support SMP, its marketing, nothing else, the bus is perfectly capable of it, as are the chipsets if they are not crippled.

HMMMMM K!

Quote:
Ah, so you do agree now ?

Have I ever disagreed with the extreme nature of the thermal output of the Prescott cores?

Quote:
Show me one real world app that gains anywhere near 7% using SSE3 compiliation versus SSE2 and I'll be impressed. Show me 50 apps, and I might agree "3-7%" is an average one should expect.

Latest builds of LAME and Gordian Knot last I checked, why the argument on this is beyond me *shakes head*.

Quote:
Sure they might; but how about the motherboards ? Signal integrity is a motherboard design issue, not a chipset issue. You just reinforced my point, if there is no 1066 fsb option today, its most likely because the motherboards would be too hard/expensive to produce.

They make 1066 boards what the hell is your point?

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
January 31, 2005 2:01:54 AM

My dear P4man

Xeon was implying that prescott can make better use of vector memory operation. So i dont get you point if there any.

The northbridge will allwayse think there only 1 cpu as a simple log of transaction will be able to route any I/O and keep CC.The MCH wont see any change.



i need to change useur name.
January 31, 2005 8:26:25 AM

>The statement was clear I just used a valid example since
>SSE,2,3 already run in bundled, as well MMX bundles, but
>whatever you argue for the point of arguing.

Lets rewind this; Vapor claimed "For one, a single Scotty core eats bandwidth like no other, imagine two." Which is correct, P4 performance does heavily depend on FSB bandwith, compare P4A with P4B and C if you don't believe this. So if you add a core while keeping the same bandwith, obviously per core bandwith halves in the worst case, meaning you should not expect stellar scaling going from single to dual core.

Your confused statements about "vectorized streaming code bundles" and "double precission integers" are not really a counter argument. But show us that link, and we might understand what you are trying to get at.

>What sends receives 4x a clock tick? Oh well not my problem
>you can't understand what I am saying,

Again, rewind. Replying to the same Vapor statement you said:
Quote:
" bandwidth issues will be moot with considerations P4's run on a quad data rate memory subsystem, CPU1 takes rise fall of the 1/2 tick CPU2 gets the rise fall of the second 1/2 of system tick"


Which is bogus of course. It doesn't matter how you reach 800 MT/s, wether it be single pumped 800 MHz or octal pumped 100 MHz, its still 800 MT/s. Adding a second core will reduce maximum bandwith per core to 400 MT/s under worst circumstances.

>Is that not what I said specialized HT code works quite
>well on dual processor machines?

No, but it was a minor nitpick; you claimed HT optimized code would work well on Smithfield, and I made a slight correction stating SMP optimized code would, since Smithfield will behave like a SMP computer, not a HT computer, and there are some minor differences between both (like avoiding cache trashing). But yes, HT optimized code will most likely work pretty well too.

>Wow your pretty stupid VIA was smoking Intel based
>chipsets, that was about the time when VIA didn’t have a
>"license" for the P4 FSB

You're a young snot I presume; if you hadn't been, you'd have known intel memory controllers have pretty much always lead the pack when it comes to implementation. Yes, at one point VIA had a theoretically faster solution when they offered 133 MHz memory, and intel kept BX at 100, but even then BX was usually faster. And no, PX400 didn't smoke the 845, they where roughly equal for a few months until intel released the 865/875. Sorry, there is just not much historical evidence of intel being outperformed by anyone else when it comes to memory controllers, and there is tons of evidence of the contrary. I don't see wha nVidia could bring to table in this respect, but I guess time will prove me right or wrong.

>Do you even know what you are talking about? Clock speed
>limits sure they come from voltage limitations which pose
>transistor switching limitations. But in the end all it is,
>is signal integrity due to parts of the CPU switching
>faster then the rest of the CPU

Of course not. There gets a point where thermal density or power requirements are just to great to cope with in a commercial desktop product. prescot is pretty close to that limit already, and if we won't ever see 5 GHz prescotts, its most likely because of power/heat, not because of switching speed.

>But as for a dual K8 vs. a dual Opteron I still don’t see
>where you argument is coming from,

I assume you meant K8 vs P4 here. Simple: K8 isn't nearly as power limited as Prescot, its signal propagation limited or transistor speed limited. Now, if you double the die, neither of these get any worse (well, not per core, but statistics will play tricks for the dual core cpu ), power issues however, will roughly double. Apply a tiny bit of common sense, and you'll see its much more likely K8 dual core will run at speeds close to current K8 top speeds, than dual core prescotts; If you can't grasp that.. what can I say ?

> if the socket can deliver the correct amperage why would
> either situation differ?

*If* the socket (and PSU,..) can, and *if* the HSF can cope with the increased power without substantially increasing core temperature, then nothing much changes. But what if it can't ? Its not like motherboard designers or HSF manufacturers had an easy time designing power circuitry for a single prescot.

>Have I ever disagreed with the extreme nature of the
>thermal output of the Prescott cores?

I guess not, but then how hard can it be to see 2 prescott cores will represent an even bigger challenge, most likely imposing clockspeed limits ? I am even convinced the current prescott is mostly power limited.

>Latest builds of LAME and Gordian Knot last I checked, wh

Toss me a link please. remember, to gauge the impact of SSE3, we'd need the same app compiled with and without SSE3 support running on the same cpu. Also, video encoding is probably the single most important app that can benefit from SSE3, you can not expect it to be representative for an overall speedup.

>They make 1066 boards what the hell is your point?

That these boards are too expensive to be produced as mass market products ?

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 31, 2005 12:27:46 PM

Quote:
Lets rewind this; Vapor claimed "For one, a single Scotty core eats bandwidth like no other, imagine two." Which is correct, P4 performance does heavily depend on FSB bandwith, compare P4A with P4B and C if you don't believe this. So if you add a core while keeping the same bandwith, obviously per core bandwith halves in the worst case, meaning you should not expect stellar scaling going from single to dual core.

Your confused statements about "vectorized streaming code bundles" and "double precission integers" are not really a counter argument. But show us that link, and we might understand what you are trying to get at.

I never stated there would not be a bandwidth issue I bloody well stated that Intel has added some features to try and make that short coming less of a issue.

With considerations if both cores are working on one thread which will be the most likely situation than bandwidth issues become moot. With Xeons both cores are busy at it and that 533 and now 800 fsb gets gobbled up quite quickly.

As for your inability to follow what I am saying about bundled data, find it yourself I have nothing to prove this is a forum, hardly a point of real consequence.

Quote:
Which is bogus of course. It doesn't matter how you reach 800 MT/s, wether it be single pumped 800 MHz or octal pumped 100 MHz, its still 800 MT/s. Adding a second core will reduce maximum bandwith per core to 400 MT/s under worst circumstances.

Did I argue that point NO!!! While both cores are busy with one thread 1/2 bandwidth means jack, they could even run it 1/2 tick style as I stated who knows. I was stating most likely scenarios.

Quote:
No, but it was a minor nitpick; you claimed HT optimized code would work well on Smithfield,

What "ideal" whatever man learn English.

Quote:
You're a young snot I presume; if you hadn't been, you'd have known intel memory controllers have pretty much always lead the pack when it comes to implementation. Yes, at one point VIA had a theoretically faster solution when they offered 133 MHz memory, and intel kept BX at 100, but even then BX was usually faster. And no, PX400 didn't smoke the 845, they where roughly equal for a few months until intel released the 865/875. Sorry, there is just not much historical evidence of intel being outperformed by anyone else when it comes to memory controllers, and there is tons of evidence of the contrary. I don't see wha nVidia could bring to table in this respect, but I guess time will prove me right or wrong.

Right....

Quote:
Of course not. There gets a point where thermal density or power requirements are just to great to cope with in a commercial desktop product. prescot is pretty close to that limit already, and if we won't ever see 5 GHz prescotts, its most likely because of power/heat, not because of switching speed.

I suppose, sick and tired of arguing with you.

Quote:
I assume you meant K8 vs P4 here. Simple: K8 isn't nearly as power limited as Prescot, its signal propagation limited or transistor speed limited. Now, if you double the die, neither of these get any worse (well, not per core, but statistics will play tricks for the dual core cpu ), power issues however, will roughly double. Apply a tiny bit of common sense, and you'll see its much more likely K8 dual core will run at speeds close to current K8 top speeds, than dual core prescotts; If you can't grasp that.. what can I say ?

K sure ya you right or whatever you want me to say.

Quote:
*If* the socket (and PSU,..) can, and *if* the HSF can cope with the increased power without substantially increasing core temperature, then nothing much changes. But what if it can't ? Its not like motherboard designers or HSF manufacturers had an easy time designing power circuitry for a single prescot.

Your right "if", if I cared to argue the points anymore, or if this mattered outside the forum.

Quote:
Toss me a link please. remember, to gauge the impact of SSE3, we'd need the same app compiled with and without SSE3 support running on the same cpu. Also, video encoding is probably the single most important app that can benefit from SSE3, you can not expect it to be representative for an overall speedup.

Am I your b*tch? Go out and get it yourself.

Quote:
That these boards are too expensive to be produced as mass market products ?

I wouldn’t know I don’t know everything.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
January 31, 2005 12:45:27 PM

>With considerations if both cores are working on one thread
>which will be the most likely situation

Ahem.. no. 2 cores can't work on a single thread
:rollseyes:

>I don’t know everything.

Agreed

= The views stated herein are my personal views, and not necessarily the views of my wife. =
January 31, 2005 10:53:35 PM

Quote:
Ahem.. no. 2 cores can't work on a single thread
:rollseyes:

Why HT fools the OS in thinking there are 2 processors, both virtual processors work away on one thread. Vanderpool does that on a grand scale allowing multiple instances of an OS.

I really don't understand what your point is to begin with, it's all OS level thread management, 1 thread divided amongst 2 processors.

Quote:
Agreed

Wish you had such humility.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
February 1, 2005 5:49:57 AM

>Wish you had such humility.

I guess my way of being humble is by not making firm statements on things I know nothing about, and not resort to name calling when I am corrected by someone.

>Why HT fools the OS in thinking there are 2 processors,

Yes

>both virtual processors work away on one thread.

How many times do I have to say: "NO!" before you bellieve me or look it up ? HT allows the CPU to work on 2 threads more or less simultaneously (one per virtual cpu), but there is no way 2 CPU (virtual, physical) can speed up the execution of a single thread. <b>THAT IS THE WHOLE FRIGGING PROBLEM WITH HT </b> (and SMP and dual core), most consumer code is single threaded.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
February 1, 2005 8:14:28 AM

Excuse me for interupting but
Quote:
Why HT fools the OS in thinking there are 2 processors, both virtual processors work away on one thread. Vanderpool does that on a grand scale allowing multiple instances of an OS.

Do you have a clue what you are talking about?
Here's a thread 1 + 0 -so what? Are you saying that 1 virtual chip will get 0, while the other gets 1? Even the newest newb here knows two chips cant divide a single thread.
February 1, 2005 12:24:18 PM

Quote:
I guess my way of being humble is by not making firm statements on things I know nothing about, and not resort to name calling when I am corrected by someone.

Ya you should shut up you don’t know what you are talking about.

Quote:
How many times do I have to say: "NO!" before you bellieve me or look it up ? HT allows the CPU to work on 2 threads more or less simultaneously (one per virtual cpu), but there is no way 2 CPU (virtual, physical) can speed up the execution of a single thread. THAT IS THE WHOLE FRIGGING PROBLEM WITH HT (and SMP and dual core), most consumer code is single threaded.

Its OS Level thread manipulation, it's still coming off the OS's main thread(s). Otherwise my 8606 handles 424 threads and 37 processes would really bog that CPU down would you not think.

If the code is specifically spun for it, it will make the necessary calls and run according on alternate threads. I don't know what your argument is; this is how the technology works. It's quite OS dependant hence the performance hit on 2K, 2K thinks it's a real second processor and that’s not what it is, just better resources balancing.

As for the second processor not speeding up a single thread, if its OS level manipulation, why would the OS not be able to split the load of the software up? *Most* software and their various internal cycles are not system tick dependant.

Just like SMP code is not 2 independent threads of code, its one core thread that can if a second third or whatever CPU is detected it can utilize those additional resources.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
February 1, 2005 4:41:31 PM

Slicing a single thread in two or more pieces and executing those pieces
simultaneously on a HT capable cpu (single or dual core) or in SMP is a
no-go. A thread is the "smallest fragment" of written/compiled code and
is not even meant to be sliced in pieses to be prosessed in parallel.

Think of a thread 1000 machine instructions long. Now share it with 1000 cpu's and process them in parallel.


Quote:

In reply to:
--------------------------------------------------------------------------------

You're a young snot I presume; if you hadn't been, you'd have known intel memory controllers have pretty much always lead the pack when it comes to implementation. Yes, at one point VIA had a theoretically faster solution when they offered 133 MHz memory, and intel kept BX at 100, but even then BX was usually faster. And no, PX400 didn't smoke the 845, they where roughly equal for a few months until intel released the 865/875. Sorry, there is just not much historical evidence of intel being outperformed by anyone else when it comes to memory controllers, and there is tons of evidence of the contrary. I don't see wha nVidia could bring to table in this respect, but I guess time will prove me right or wrong.



--------------------------------------------------------------------------------

Right....


Right!!!
February 1, 2005 6:21:36 PM

Quote:
Even the newest newb here knows two chips cant divide a single thread.

Any <i>existing</i> chips, no. Of course not. That'd be silly. But what about in the fanciful dream world of 'the future'?

Imagine a multi-cored CPU with a processing-arbiter that in a manner similar to pre-fetching can adequately 'guess' how registers will be utilized. It could re-distribute pieces of a workload to other cores with relabelled registers. So long as the pre-processing guessed right and split the workload in a way that the registers didn't need to be shared across cores during execution it should be possible. Any data that needs to be shared or moved after execution could then be copied over by the processing-arbiter in preperation for the next cycle.

In fact if the processing-arbiter were double-clocked like the integer units of a P4 are then it could possibly even copy/move data between registers of the multiple cores during the half-clock to expedite shared integer operations in CPUs of that style.

Or if one went one step further and virtualized an entire range of registers shared by all cores instead of splitting them into individual sets of registers seperate on each core, then the copying/moving would be completely unnecessary as all cores would have access to the other registers. It would also eliminate nearly every penalty for a pre-processing 'guess' miss other than the possible lack of 100% utilization.

The real key to anything like this ever working would of course be that the processing-arbiter would have to be capable of breaking down the code and recreating the same execution using different code with 100% accuracy. The processing-arbiter would have to run very efficiently, possibly even wasting the first cycle of any new execution to pre-load the code and rewrite it so that its feed can remain one cycle ahead of the actual execution.

And to take it a step further, if the processing-arbiter were keeping track of threads and their usage of the cores, it is possible that the processing-arbiter could not only decide when best to split a single thread and when to run multiple threads, but also might even be smart enough to jam pieces of code from more than one thread into a single core for execution that cycle like a similar (if not improved) version of Intel's HT. And if capable of that, it might even be capable of optimizing badly written code to utilize more registers and f/iops per cycle than was originally programmed by recognizing any independent operations that could be executed this cycle instead of in the next.

Of course no such animal as this wonderous processing-arbiter exists and I highly doubt that Intel (or AMD, or anyone else) will manage to create such a concept any time soon. (Except for maybe Sony's Cell.) But if multiple cores could be united by such a device then there would be definite advantages of dual-core over dual-CPU. Though with the extreme utilization possible, it would also bring heat levels to their maximums quite easily. And if the technology weren't completely free of bugs then some people's software might suddenly become bug-ridden because of the processing-arbiter and not because of their actual code.

Ah, 'the future', how I long to see thee. I don't think that this would be possible above 4 cores per die however because after that the distances between the processing-arbiter and the dies would either become asymmetrical or there would be a lot of die space wasted between cores. It likely would work best with just two cores. Well ... unless you went 3D and stacked cores upon cores instead of laying them side by side. :\

<pre>I just want to say <font color=red>I wuv you</font color=red>.
And I mean it fwom the <font color=red>bottom of my hawt</font color=red>.</pre><p>
February 1, 2005 6:53:12 PM

Quote:
Imagine a multi-cored CPU with a processing-arbiter that in a manner similar to pre-fetching can adequately 'guess' how registers will be utilized.

Well that is a little strange. A thread is the equivalent of a prime number. Dividing it could only result in an incomplete answer.
On the other hand, if pre-fetch were capable of fortelling the future, the gambling establishment would get pisssed. Even Intel wouldn't want to fool with those guys. Then of course concepts of self determination would get kind of quashed.
February 1, 2005 9:54:19 PM

Try to make an OS run the following single-threaded code on two processors...

MOV AX,5
MOV BX,6
ADD AX,BX

Remember, both CPU's have their own AX & BX registers.

...won't work!

You either have to recompile the code to optimize both CPU's (which is stupid as you will probably need the third core just for this) or resort to other poster's Futuristic Imaginery CPU which sounds easier.
February 2, 2005 9:17:58 AM

> I don't know what your argument is; this is how the
> technology works.

sigh..

= The views stated herein are my personal views, and not necessarily the views of my wife. =
February 2, 2005 11:58:58 AM

Quote:
A thread is the "smallest fragment" of written/compiled code and
is not even meant to be sliced in pieses to be prosessed in parallel.

Let’s refine the above statement like:

… to be processed in parallel <b>by two processors</b>.

Parallel processing is nothing new and has been implemented on x86 CPU’s since early Pentiums (as far as I remember). Even single core CPU’s have multiple ALU/FPU units to process multiple instructions at once, whenever they can be run in parallel, i.e. when the second instruction doesn’t depend on the first’s result.

This doesn’t work on multiple CPU’s/Cores for two reasons.

One, registers can’t be shared.

Two, a CPU can look ahead only by a few instructions to identify other instructions which can be run in parallel (explanations available upon request).
February 2, 2005 12:17:23 PM

Finally someone puts up a valid point, I was ready to walk away not thinking anyone would mention that.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
February 2, 2005 5:40:01 PM

Quote:
Well that is a little strange. A thread is the equivalent of a prime number. Dividing it could only result in an incomplete answer.

You're right in that dividing it resluts in incomplete answers, but the job of the processing-arbiter would not only be to divide the thread, but to also recombine those incomplete answers into a complete answer.

For example, here's a generic example of assembly instructions for a single thread:
mov a, b
mov d, 0
mov c, 4
mov d, 1
mul a, 4
mov d, 2
add c, 9
mov d, 3
mul c, 2
add a, 2
mov d, 4
mul a, a
mul c, d
mov d, c
mov b, a
mov a, 1
add b, c

Now look at how that can be re-programmed by a processing-arbiter to execute across multiple CPUs:

CPU1:
mov a, b
mul a, 4
add a, 2
mul a, a

CPU2:
mov a, 4
add a, 9
mul a, 2

CPU3:
mov a, 0
mov a, 1
mov a, 2
mov a, 3
mov a, 4

processing-arbiter (handled at tail end of cycle):
mov CPU1-d, CPU3-a
mov CPU1-c, CPU2-a

CPU1:
mul c, d
mov d, c
mov b, a
mov a, 1
add b, c

And if the processor were designed to share a big cache of registers where the processing-arbiter hands out virtualized labels so that all cores shared the same registers then the processing-arbiter wouldn't even need to do its two moves at the end of the cycle, thus shortening the pipe.

The whole concept of the processing-arbiter is to break apart instructions and reorder them so that they can be run across multiple CPUs without changing the end results. It's in effect still a single thread, just distributed across multiple cores so that the execution units of all cores are utilized as fully as possible.

And in fact the processing-arbiter could possibly even be designed to do things like identify the wasted processing done in things like:
mov a, 0
mov a, 1
mov a, 2
mov a, 3
mov a, 4

And shrink it down into:
mov a, 4

If a multi-core processor could have a processing-arbiter that does this then it would help bridge that gap between single-threaded and multi-threaded apps so that multi-core systems wouldn't be as idle when running heavy single threads. Unfortunately I don't think that it could ever be developed effectively for multiple CPU systems as the communication between CPUs would be just too slow and then the processing-arbiter would also have to deal with a lot more syncrhonization issues than it would in a multi-cored CPU.

<pre>I just want to say <font color=red>I wuv you</font color=red>.
And I mean it fwom the <font color=red>bottom of my hawt</font color=red>.</pre><p>
February 2, 2005 10:26:57 PM

I don’t see any added value of a code-arbiter (I think that’s how you call it) chip in the way you demonstrate it.

It sounds funny when you first come up with lines of redundant and meaningless code then shrink it down to a meaningful level and then claim that it’s the virtue of a revolutionary (and imaginary) chip.

Try to decorate your code with a few branch instructions and let’s see what happens…
February 2, 2005 10:47:01 PM

It might work.......... for a pacman type game. In terms of real computing, in non linear instructions, the thread would break
February 3, 2005 7:21:52 AM

Intel have patent on micro treads it allow the cpu to break treads in multiple very small tread to run on a SMP CPU.

The pape i have read had a SMT CPU in mind not a dual core.Also Alpha was working on this.Allow you the CPU to have very width stage lane 8 or 16 ALU and 4 to 8 FPU.Alllowing the maximun flexibility.

i need to change useur name.
February 3, 2005 1:49:56 PM

Quote:
Try to decorate your code with a few branch instructions and let’s see what happens…

What happens is that it becomes even more powerful because then, like in EPIC, it can start processing several branches simultaneously before it even knows which branch to use. Then it just ditches the unused branches when it knows which one to go with.

<pre>I just want to say <font color=red>I wuv you</font color=red>.
And I mean it fwom the <font color=red>bottom of my hawt</font color=red>.</pre><p>
!