Sign in with
Sign up | Sign in
Your question

What's happening to CPU prices?!

Tags:
Last response: in CPUs
Share
May 3, 2004 1:54:28 AM

This is getting stranger every time I visit my local store. Last month a Pentium 4 3.0 GHz was affordable, but a 3.2 GHz costed half as much more. Now, the 3.2 GHz dropped and the 3.4 GHz is twice as expensive. Soon, the 3.4 GHz will be affordable, but the EE version will still cost four times more!

But what's beyond the 3.4 GHz? There haven't been notable technological advances for more than a year. And Prescott isn't really mind-blowing. Soon all processors will be cheap as bread and you'll pay a fortune -literally- for something 10% faster than mid-end.

So where is this heading? Should we all start buying dual CPU systems, or go with the mighty Itanium? I need a really fast number-cruncher, but I don't want to sell my house for a few hundred MHz more...

More about : happening cpu prices

May 3, 2004 2:56:05 AM

I agree, the prices on CPUs have been very crazy lately. I hope this is just a local thing that will dissipate, 'cause the tendency to put the fastest processor above, say, $700, is just pathetic. I mean, that's almost in the Xeon/Opteron price range......

<i><font color=red>You never change the existing reality by fighting it. Instead, create a new model that makes the old one obsolete</font color=red> - Buckminster Fuller </i>
May 3, 2004 1:08:49 PM

That's the market!

There is people who pay much for TOP CPU's and there will always be. Why do you think you can buy DVD player for 40$ to 500$, because there si crazy people out there who think that price matter.

We all know (enthousiasts) that the difference in performance between a P4 3.2 and 3.4GHz is not worth the extra money. But, you know that people with lots money are willing to pay that extra cash, because they have it and they want to burn it.

--
Lookin' to fill that <font color=blue>GOD</font color=blue> shape hole!
Related resources
May 3, 2004 4:49:22 PM

Agreed. The prices for top end have been to high for too long. How many do they sell for $800 compared to $400. On first release.
May 3, 2004 5:13:15 PM

I love being on the trailing edge of leading edge technology. That's where all the bargains are. Bragging rights (while a powerful force in the universe) are for kids and in the end only make the greedy corporations fat and happy.

After the new i915/925 platforms are introduced with new P5 Prescott cores at 3.6GHz +, pick up that previously mentioned 3.4GHz Northwood at say $235 (or whatever) and slap it on a Abit IC7 Max3 with the Thermalright or Swiftech heatsink of or choice (did somebody say watercooling ?) and overclock that bad boy well over 4GHZ at a fraction of what it would cost to buy an i925 mobo, P5 CPU, DDR2 memory, and lets not forget that new BTX case you'll need to house it.

"Nuke em till they glow, shoot em after dark"
May 3, 2004 8:14:03 PM

I agree wholeheartedly. I can't wait to upgrade my GF3 to a 9800 Pro sometime soon when they're dirt cheap.

Athlon XP 1900 (11x200) 42C (Load w/AX-7 & 8cm Tornado) - MSI K7N2 Delta - Corsair Value PC3200 - Gainward GF3 @ 250/550 - 80Gb WD 8Mb Cache -
May 3, 2004 11:51:46 PM

I understand it's the market, but what's taking technology so long? We always used to have 'exponential' prices but never this steep. I can buy a 3.2 GHz for 300 €, but if I really have 600 € to spend, I only get 200 MHz extra! Things used to have been very different.

I also understand that dual-core processors are the future, but they won't appear before late 2005. So what will happen in the meantime? 3.6 GHz at 100 € and absolutely nothing beyond that? Ok maybe a 3.8 GHz EE for 1000 €. That's ridiculous!

Is it the right time to switch to dual-processor systems? 2.4 GHz Xeons are quite cheap and give a combined performance of 4.8 GHz. That's something I could start to use! Or is there a better way to get a system that performs equally?
May 4, 2004 2:13:51 AM

For your own PC, it's OK to grab the trailing technology. Especially if budget is limiting. But for office use, insist for the best setup your company is willing to provide. That's how I ended up having AMD64 3000+ at work and PIII 667 at home.
May 4, 2004 7:55:03 AM

> I can buy a 3.2 GHz for 300 €, but if I really have 600 €
>to spend, I only get 200 MHz extra! Things used to have
>been very different.

Not really. The only thing that has changed, is that in the past those €600 would have gotten you 7 or 33 Mhz extra, instead of 200 ;) 

Seriously, throughout the years I've been watching this market, it has pretty much always been like this. Mainstream cpu's are priced relatively close to each other, and the fastest 2 or so speedgrades nearly double the price for no more than a few percent better performance.

>I also understand that dual-core processors are the future,
>but they won't appear before late 2005. So what will happen
>in the meantime?

The usual, higher clocks, faster busses and memory,..

>3.6 GHz at 100 € and absolutely nothing beyond that?

Top end part will *never* be €100. The cheapest they have been from intel was ~$400 or so, and that was when intel was caught with it pants down with the Athlon FX taking the performance crown, and the P4EE not being shipped yet.

>Ok maybe a 3.8 GHz EE for 1000 €. That's ridiculous!

Why ? How is it different as what it is now ? Fast forward a year or so, and you'll see the same thing, only dual core chips costing exponentially more than single core chips, while only offering a decent speedup in some apps (probably not gaming).

>Is it the right time to switch to dual-processor systems?
>2.4 GHz Xeons are quite cheap and give a combined
>performance of 4.8 GHz.

Depends entirely on the apps you run, but even in the best case, a dual CPU will not speed up 2x over a single cpu. Real world speedup range from 0% to 40%. For most uses, its a waste of money really..

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 4, 2004 11:52:23 AM

I always said:" Don't buy the top notch, buy those a bit slower and yuo will save a lot of money" And I have been saying this since 2001 when I had my own PC. Since then I keep watching the prices since and it's never been different. The only problem now is the A64FX. There's only one on the production line at a time so I don't see how prices may go down on it
May 4, 2004 7:47:09 PM

Quote:
Seriously, throughout the years I've been watching this market, it has pretty much always been like this. Mainstream cpu's are priced relatively close to each other, and the fastest 2 or so speedgrades nearly double the price for no more than a few percent better performance.

True. Except for the fact that 2.8 to 3.2 GHz, which I consider high-end (consumer), now sells for prices well below 300 €. That used to be mid-end prices! I mean, these processors are really affordable. But paying twice as much doesn't give me anything extra, while one year ago there was really a significant performance difference, for this price difference.
Quote:
Why ? How is it different as what it is now ? Fast forward a year or so, and you'll see the same thing, only dual core chips costing exponentially more than single core chips, while only offering a decent speedup in some apps (probably not gaming).

I cannot agree. Dual-processor systems offer twice the performance of single-processor systems in many applications. I don't care about games and besides, they should learn how to use multi-threading. I don't expect dual-core processors to be much different. If a low-end dual core 2 x 3.0 GHz costs twice as much as a high-end single-core 4.0 GHz I would buy it! There's just no alternative like that these days. EE processors are ridiculously slow once you've seen the price tag.

I'm not expecting a linear performance / price curve, not at all. But the current curve is more exponential than ever. What's even worse is that I don't see any changes for at least a year. Moore's law is no longer true!
May 4, 2004 8:41:28 PM

>True. Except for the fact that 2.8 to 3.2 GHz, which I
>consider high-end (consumer) ..<snip> .. I mean, these >processors are really affordable

Well, if they are so affordable, I guess they are no longer that high end, are they ? "Really high end" would be the 3.4 (3400+), and 3.2EE, 3,4EE (AFX) versions then. Really not different from say ten years ago when a (once highend) Pentium 90 and 100 Mhz would be affordable (well below 300 value adjusted euro's), and the 120 and 133 MHz parts would not.

>while one year ago there was really a significant >performance difference, for this price difference.

Not more "significant" than between a P4 3.2C and a 3.4EE or a A64 3200+ versus a FX53.

>Dual-processor systems offer twice the performance of
>single-processor systems in many applications

Name me one (more or less common) real world app that speeds up by a factor 2x running on a 2 way system. 50% would already be very "SMP friendly", 25% more typical.

>and besides, they should learn how to use multi-threading.

LOL. Yeah, its so easy.

Some problems just don't lend themselves to SMT, because they are linear by their very nature. Think of it this way: if I can look up a word in the dictionnary in 10 seconds on average, if I could work (read, think and turn the pages) twice as fast, I could do it in 5 seconds. How long do you think it would take 2 persons (as fast/slow as me ) ? Or even 10 ?

What you are expecting from developpers is the equivalent of: find me a way to look up a word in the dictionnary twice as fast using twice as many resources (people). Good luck !

A better analogy would be software development itselve; if you've ever worked on a big project, you'd know doubling the resources (number of developpers) *never* halves the development time, not even anywhere near that. Beyond a certain point it will even *increase* it.

> If a low-end dual core 2 x 3.0 GHz costs twice as much as
>a high-end single-core 4.0 GHz I would buy it!

Then go ahead and buy a dual 1.5 GHz Xeon ($69 pricewatch), its definately cheaper than a single cpu highend system. However, it won't be (nearly) as fast either.

>I'm not expecting a linear performance / price curve, not
>at all. But the current curve is more exponential than
> ever.

You mean like when a 166 Mhz Pentium costed $300, a Pentium 200 ~$600, a 180 MHz Pentium Pro ~$1.000 and a 200 Mhz one ~$2.000 ?

>What's even worse is that I don't see any changes for at
>least a year.

Thats funny, I haven't sees any significant changes over the last 15 years either.. Hmmm... coincidence ?

>Moore's law is no longer true!

Moore has *NOTHING* to do with it. Its about transistor densities doubling every 18 or so months, and as such, its still as true as ever, with no change for the foreseeable future either with multicore chips and huge caches on their way. I'd even WAG GPU's are outperfoming Moore's "law" by a big margin.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 5, 2004 7:10:01 AM

Quote:
Really not different from say ten years ago when a (once highend) Pentium 90 and 100 Mhz would be affordable (well below 300 value adjusted euro's), and the 120 and 133 MHz parts would not.

That was between 20 to 50% clock increase. In other words, you would be comparing 2.8 and 3.2 GHz with 3.8 and 4.2 GHz! Really not different?
Quote:
Not more "significant" than between a P4 3.2C and a 3.4EE or a A64 3200+ versus a FX53.

Please explain. I really fail to see the solid 50% performance increase there. Besides, you are comparing consumer market with 'extreme' and server market here. 3.2 GHz is the top of the consumer market now, and it's affordable. Nothing above it, not even 10%, although I'm really still looking for that 50%. That's what I've been trying to say here.
Quote:
Name me one (more or less common) real world app that speeds up by a factor 2x running on a 2 way system. 50% would already be very "SMP friendly", 25% more typical.

Maya. MATLAB. Visual C++. PhotoShop. Oops, you only asked one example...
Quote:
Some problems just don't lend themselves to SMT, because they are linear by their very nature. Think of it this way: if I can look up a word in the dictionnary in 10 seconds on average, if I could work (read, think and turn the pages) twice as fast, I could do it in 5 seconds. How long do you think it would take 2 persons (as fast/slow as me ) ? Or even 10 ?

You should learn how to use multi-threading as well. I'm not talking about one 'dictionary lookup', I'm talking about several dozen. And you -can- do that twice as fast with twice as many people. Would you rather run the Google search engine on single or dual processor systems?
Quote:
What you are expecting from developpers is the equivalent of: find me a way to look up a word in the dictionnary twice as fast using twice as many resources (people).

No. I'm expecting them to identify threadable operations. Things that can be computed independently. Even in a very 'linear' algorithm, instruction parallelism is around 20. So current (single) processors absolutely don't execute it optimally. I'm not saying it will be easy, but programmers will have to take advantage of multi-threading sooner or later. It's inevitable that future processors will be multi-core, multi-HT. It costs too many transistors to get another 5% of linear performance increase.
Quote:
A better analogy would be software development itselve; if you've ever worked on a big project, you'd know doubling the resources (number of developpers) *never* halves the development time, not even anywhere near that. Beyond a certain point it will even *increase* it.

If you've ever worked on a big project, you'd also know that making everybody work twice as fast (or efficient) has its limits too. At one point, better soon, you have to double the number of employees. Currently processors are still one-man companies. I really know that doubling resources isn't the silver bullet but now they're not even trying!
Quote:
Then go ahead and buy a dual 1.5 GHz Xeon ($69 pricewatch), its definately cheaper than a single cpu highend system. However, it won't be (nearly) as fast either.

I know there's overhead involved, that's why I would only consider buying dual 2.0 GHz or higher. 2.4 GHz Xeons still have a nice price and come with Hyper-Threading. On optimized software it will run faster than a 4.0 GHz.
Quote:
Thats funny, I haven't sees any significant changes over the last 15 years either.. Hmmm... coincidence ?

I was talking about the future, not the past. It's clear to me that single-core technology has reached a wall. Sure, they'll still have 4.0 GHz by 2005, but nothing really spectacular. Speaking of spectacular, the newest GPUs are -twice- as fast as their predecessors, doubling the number of, yes, threads. I realize it's the ideal parallelizable algorithms but many applications have bottlenecks that are just as computationally intensive and parallel.

It simply won't be effortless any more for programmers to take advantage of new processor capabilities...
May 5, 2004 8:59:25 AM

> That was between 20 to 50% clock increaseIn other words,
>you would be comparing 2.8 and 3.2 GHz
>with 3.8 and 4.2 GHz! Really not different?

No, not that different. Sure, jumps in clockspeed have reduced as we got more models -I mean, we have how many actual models ? 2,2.2,2.4,2.6,2.8,3.0,3.2,3.4, most in both 400, 533 and 800 MHz fsb's, as Celerons, P4s and P4EE's, with and without HT, SSE3,... there has go to be over 30 different variants if not more. In the old days you'd just have Pentium 75 (low end, like celeron), 90/100 mainstream, and 120/133 high end. So obviously the difference between mainstream and a one-step faster cpu would be bigger. But if you look at the product range and its pricing curve, you wouldnt see much if any difference. I would say 2.8 is mainstream now, which is 20% slower (clock) than the "normal" high end the 3.4C/E, and the uber/ultra EE would add another 10%performance (L3 cache). That is pretty much identical to the Pentium 100 being 20% slower than the 120, and the 133 adding a bit on top. You do realize the Pentium 133 was more expensive than the Pentium 4 3.4 EE when it was launched ? $935, 10 years ago.

Here is a good link for you:
<A HREF="http://www-ra.phys.utas.edu.au/~dr_legge/chip_prc.html" target="_new">http://www-ra.phys.utas.edu.au/~dr_legge/chip_prc.html&...;/A>

>Please explain. I really fail to see the solid 50%
>performance increase there

I fail to see a solid 50% performance increase between a Pentium 100 and a 133 as well. And 30% would be more or less the difference between a 2 GHz A64 3000+ (mainstream) and a 2.4 GHz Athlon 64 FX or between a 2.8 GHz P4C and a 3.4 GHz EE.

>Maya. MATLAB. Visual C++. PhotoShop. Oops, you only asked
>one example...

Found no benchies for Matlab, but I really can't consider it a typical app either even if it would somehow reach anywhere near 100% speedup.

<A HREF="http://www.gamepc.com/labs/view_content.asp?id=wsapps&p..." target="_new">Maya</A>: <b>39%</b> increase going from 1 to 2 (otherwise identical) Athlon 2100+'s.

<A HREF="http://www.gamepc.com/labs/view_content.asp?id=wsapps&p..." target="_new">]C++ </A> a solid <b> 7% </b> improvement.

<A HREF="http://www.gamepc.com/labs/view_content.asp?id=thunderk..." target="_new"> Photoshop 7 </A> impressive <b>14%</b> speedup.

Oops, I only asked for one indeed. None of your examples even seem to reach my optimistic 50% number.

> And you -can- do that twice as fast with twice as many
>people.

Not if you don't have twice as many dictionnaries and twice as many queries to run as well. And not if one of the searches depends upon the result of the other.

> I'm not saying it will be easy, but programmers will have
>to take advantage of multi-threading sooner or later. It's
>inevitable that future processors will be multi-core,
>multi-HT. It costs too many transistors to get another 5%
>of linear performance increase.

I don't disagree with you actually, its just not a silver bullet, and horizontal scaling (SMT, CMP,.) just can't replace vertical (clock/IPC) scaling for every problem. It also shifts the burden from hardware design/manufacturing to software design, something that rarely pays off. They should go hand in hand. A dual core 1.7 GHz willamette would be a poor substitute (performance wise, desktop workloads) for a 3.4+ GHz P4.

>If you've ever worked on a big project, you'd also know
>that making everybody work twice as fast (or efficient) has
>its limits too. At one point, better soon, you have to
>double the number of employees.

Doubling the timetable is a much more efficient solution :) 

>Currently processors are still one-man companies

Not really, they have multiple execution units, and its already hard to write code to use those efficiently.

> 2.4 GHz Xeons still have a nice price and come with
>Hyper-Threading. On optimized software it will run faster
>than a 4.0 GHz.

Precious few, if any apps would. You might get twice the throughput running two (or 4) instances of SETI client, but for anything more complex, or speed and not throughput oriented, it won't be better. Don't forget you are not doubling memory bandwith (in fact, in your example you would decrease it by 75%/ cpu !), you are not reducing memory or cache latency either. Opteron is an exception where memory bandwith would scale with the number of cpu's but only using NUMA aware OS+apps, and even then you are increasing latency.

>I was talking about the future, not the past. It's clear to
>me that single-core technology has reached a wall.Sure,
>they'll still have 4.0 GHz by 2005, but nothing really
>spectacular.

Spectacular like what ? Like going from 100 to 200 MHz in two years ? Or 1.5 to 3 Ghz in 2 years ? Or 3 to 4 Ghz in one year ?

If there is any wall we are hitting, its not potential single threaded performance increases, but a thermal wall, and guess what, a dual core chip will roughly consume twice as much as a single core chip.

Obviously, we will see multicore chips in the future, but for somewhat other reasons as what you imply: smaller processes finally give us cores small enough to economically produce several of them in one chip (<200mm²). That used to be impossible. just like HT, it will help especially running several programs or process simultaneously, but it won't speed up most things by more than 30%.

A very interesting quote and link in this context:

Quote:
For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit co-operative solution...The nature of this overhead (in parallelism) appears to be sequential so that it is unlikely to be amenable to parallel processing techniques.


This is a quote by Gene Amdahl from <b>1967</b>.
<A HREF="http://home.wlu.edu/~whaleyt/classes/parallel/topics/am..." target="_new">Amdahl's law </A>

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 5, 2004 2:50:55 PM

In order to double the performance using 2 CPU's, you need everything doubled overall: twice the Mhz of the single CPU, twice the bandwidth, half the latencies,etc. Mem bandwidth condition has been completed, but what about HDD bandwidth? You're right. It's still a long time until that law can finally be contradicted.
May 5, 2004 10:22:38 PM


Thanks for that link, because it proves part of what I'm saying. Look at the Pentium III prices of August 2000. The 1000 MHz costed 669, and the 700 MHz costed 193. Translating that to todays situation, a 3.0 GHz could cost 669, and a 2.1 GHz costs 193. Yeah right. A 3.0 GHz is affordable now, while nobody would pay 193 for 2.1 GHz! And while paying 400 extra those days gave you 300 MHz extra (42%), it's still ~300 MHz (10%) you get when paying 400 extra for todays processors as well!

I do realize these numbers can be pushed and pulled a bit, but I can't help this feeling I have with today's market situation. Maybe I'm the only one who sees it but high-end processors are really affordable and for one notch higher you pay a fortune. And I blame it on the technology wall.
Quote:
I fail to see a solid 50% performance increase between a Pentium 100 and a 133 as well.

You mentioned 90 MHz as well...

Anyway, this might be an eye opener for you (and many other): <A HREF="http://images.digitalmedianet.com/2003/03_mar/editorial..." target="_new">Multiprocessing Performance</A>

What we see here is three similar applications (image processing) running on single and on dual 1 GHz G4's. Although they all did the same type of test, the best software did the job nearly twice as fast, while the worst didn't show significant benefits at all.

This learns us that for multi-processor (or multi-core) systems you really can't say it's 25% faster, or any other fixed percentage. It -all- depends on the software. And this example clearly shows that a lot can be won just by properly optimizing the application for multi-threading.

Note that we would be talking about 5.5 GHz in Pentium 4 terms here! It will take some effort from the developers, but isn't that worth it? It's a fact that developers have become lazy because current processors take care about everything like scheduling and precaching. But I firmly believe that if all those transistors would be used for execution units and the programmer (and compiler) would really make use of that architecture, we'd be far beyond current performance levels. Note that there's a parallel with VLIW processors like the Itanium. They use parallelism explicitely and even at low clock frequencies their performance blows everything away -if- the software is optimized for it.

Once the bigger part of processors are multi-core, developers will really start to learn how to use it, and getting a performance increase close to the number of cores won't be an exception. Current developers nearly all work on single-processor systems so they simply don't see the benefits of optimizing for multi-threading.
May 6, 2004 8:50:59 AM

>Thanks for that link, because it proves part of what I'm
>saying. Look at the Pentium III prices of August 2000.

Thats a terrible datapoint. The 1 GHz Pentium III was a phantom chip, and mostly a PR launch to counter AMD's 1GHz and 1.1,1.2 GHz Athlons; but for practical reasons let alone revenue, it just didnt exist. Out of ten years of cpu pricing history, you pick the one (arguably only) moment where intel was lagging AMD in the clockrace and where they could not play their normal pricing game. It really doesnt prove what we are seeing now is a trend break.

>Translating that to todays situation, a 3.0 GHz could cost
>669, and a 2.1 GHz costs 193

No, translating that to today, it could mean a 3.4 (which is the top banana now, not the 3 GHz) would cost $669 and the 2.4 GHz (700/1000 x 3.4) would cost $193. They are both slightly cheaper than that, but close enough to match the pattern of even those days. And as I said, you could not have picked a less representative datapoint.

> Maybe I'm the only one who sees it but high-end processors
>are really affordable and for one notch higher you pay a
>fortune.

Like always. And you insist on calling the 3 GHz part "high end", while really, even if its a fast chip, its trailing bleeding edge by a fair ammount. 3GHz parts have been out for over 18 months. The 3.4 EE has a 12% higher clock, and with the extra cache it performs roughly like a 3.6 would, giving it a 20% lead over the 3.0. The FX53 would be ahead even further. If you look at that link I provided, at most points in time, if you'd pick the fastest cpu, look at its price, and compare it to the mainstream ones (being ~20% slower), you'd see a very similar pattern. I'm sorry, this is nothing new. No offense, but how long have you been watching this industry ? Something tells me you're a bright fellow, but not into this market for very long. I've been watching it for 15+ years, and the more things change, the more they stay the same. IF anything changes, its that overall prices have dropped, and you only need stiffer competition and much higher volumes to explain that.

>Anyway, this might be an eye opener for you (and many
>other): Multiprocessing Performance

LOL ! That is the best you could come up with ? They compare a single CPU *laptop* most likely using 100 MHz ram with a dual cpu desktop using PC133. Even there, the "best" multithreaded software, performing a type of workload that is probably close to the best case scenario for SMP, and yet it doesnt even come near to your 100% speedup claim.

I understand your point about some software being written to take better advantage of SMP or CMT, but what really matters is overall performance.

Its nice to see a 70% speedup on dual processors (using faster ram, and God knows what else is different between the systems, so say maybe 50-60% on identical cpu's), but what good does that do when that same software is 2x to >5x slower than competing packages ? Here, this may be an eye opener for you:

<A HREF="http://www.emedialive.com/Articles/ReadArticle.aspx?Art..." target="_new">http://www.emedialive.com/Articles/ReadArticle.aspx?Art...;/A>
<A HREF="http://www.emedialive.com/Articles/ReadArticle.aspx?Art..." target="_new">http://www.emedialive.com/Articles/ReadArticle.aspx?Art...;/A>

So is Final Cut Pro really such a fine example of good coding ?

>This learns us that for multi-processor (or multi-core)
>systems you really can't say it's 25% faster, or any other
>fixed percentage. It -all- depends on the software

I just barked at your 100% claim, and pretty much proved its incorrect. I said it ranged from 0-30% on average, up to around 50% on SMP friendly jobs (like rendering). Its ironic your links seem to support this claim, and you have failed to give me even one counter example of a 100% speedup.

You can blame developpers all you want, but reality is what it is. A dual cpu machine isnt nearly twice as fast as a single cpu machine. And no ammount of clever coding is ever going to make that happen. As it is, using the most clever compilers and coders, its already nearly impossible to make optimal use of *one* cpu's execution resources. If you doubt that, try running whatever cpu intensive app it is you use, look at the processor temperature. Now compare it to the temps you are seeing running something like BurnK7 which was designed specifically to max out all execution resources on a K7. HUGE difference.

Or another example: hyperthreading. All HT really does is try to make better use of the execution units of the P4, and fill pipeline bubbles. If your code was somehow magically close to 100% optimal, HT would not change a damn thing.

OTOH, a SMP machine can have (nearly) twice the throughput, (running several processes), which is one of the reasons you have been seeing SMP in servers for decades. But typical server and desktop workloads are really different worlds.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 6, 2004 11:20:50 AM

It'shard to imagine a 100% speedup. Like, you need everything double in terms of performance. As I previously stated, youneed twice the freq.(OK), twice themem bandwidth(OK for Opterons w/ proper OS),twice the HDD bandwidth(not OK), half the latencies - mem and cache(not OK) and all this talking in terms of hardware.
in software terms it comes like this: the program yourun must be able to use both CPU's equally(if it uses,let's say 97% of a single CPU, then it must use 97% of each CPU), it also has to transfer data from one CPU to the other w/o any sort of latency, wich is not the definition of multi-CPU systems, and also the bus between the CPU's has a limited bandwidth. So applications might benefit from multi-threading, but never reach that 100% speedup. Multi-threading is more usefull when multitasking, but you're talking about singletasking. Sorry.
May 6, 2004 12:03:39 PM

Hmmm.. your logic is slightly flawed. If you double frequency, half latencies, double bandwith, double I/O, you'd get exactly twice the performance with just one cpu :) 

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 6, 2004 11:57:29 PM

Quote:
Thats a terrible datapoint. The 1 GHz Pentium III was a phantom chip, and mostly a PR launch to counter AMD's 1GHz and 1.1,1.2 GHz Athlons; but for practical reasons let alone revenue, it just didnt exist.

Very well, then take Q3 1997 and compare 3.4 GHz - 2.4 GHz with 233 MHz - 166 MHz. Or Q4 1998 where you can compare it with 450 MHz - 333 MHz. Can you show me a date where the situation was 'worse' than today? That would really show me that it's all just a temporary thing.

Either way, I never said the current market situation is totally abnormal. I just see a small trend and if it continues like this I do believe there is reason to worry about current technology. In one of the other threads here I read about oxide layers being only 6 atoms thick! I can imagine this is very close to what is physically possible, while still having a production process that is, well, productive.

That's why I also believe that new architectures will soon be required, and to make more efficient use of the available transistor, much of the effort will shift to the programmer.
Quote:
Something tells me you're a bright fellow, but not into this market for very long. I've been watching it for 15+ years, and the more things change, the more they stay the same.

Well my first CPU was a Z80 which was one of the first affordable consumer products. I was very young then but I've certainly seen the whole Pentium evolution in detail. But it's only about half a year ago that I started to notice that the performance to price ratios showed some weird behaviour. And in the last few months my local store kept dropping the prices of first 2.4 GHz, then 2.6 GHz, 2.8 GHz, and now 3.0 GHz to a really affordable level. In the meantime, no new processor had appeared and the 3.2 GHz remained almost steady at three times higher price. The picture is a bit different when you add the Prescott and EE models, but still I found it very remarkable how fast these prices dropped, while keeping the others really high. I can clearly see this wall they're hitting around 3.2 GHz.
Quote:
They compare a single CPU *laptop* most likely using 100 MHz ram with a dual cpu desktop using PC133. Even there, the "best" multithreaded software, performing a type of workload that is probably close to the best case scenario for SMP, and yet it doesnt even come near to your 100% speedup claim.

Good point. The notebook has a slower memory interface. But I think it would be fair to assume that this causes a difference smaller than 10%, because that's what the After Effects software loses. So I also wouldn't be terribly wrong if Final Cut Pro still performed more than 55% higher in a dual processor configuration with PC100. There is no way such a gain could be achieved by doubling the transistor count of a single processor. And memory bandwidth is cheap so 70% is still realistic. Granted, it isn't 100% either, but I know a lot of people would really want to pay more than double, for 55-70% performance increase. Besides, 70% is actually very close to 100%: the difference in execution time is only 17%!

Anyway it's an illustration that programming technology and processor technology should cooperate to reach new performance levels. We've seen the same thing with SIMD technology (MMX and SSE). First they didn't offer any real-life speedup at all, because no programmer used it (or badly, like AoS) and it took years before consumers saw the benefits. But gradually and silently more applications made use of it and now some algorithms run up to four times faster with it! Multimedia players with low CPU utilization would be unthinkable without MMX and all 3D games would be processor limited without SSE. What I'm trying to say is, processor architecture really matters, and programmers will have to make efforts to benefit from it.

Last year I wrote an instruction scheduler for a compiler back-end. The results were totally dissapointing, and it wasn't my fault. It was the processor that already does instruction rescheduling, register renaming, memory prefetching, etc. Now imagine all the transistors for this technology were used to double execution units, and I used my software scheduler to do the same things the processor did. That would easily give me the same 55-70% performance increase, but now without extra transistors! The only drawback would be that unoptimized software would run slower. Like I've said before this is the approach taken by VLIW processors like the Itanium, and it works. I do realize though that it would take many years to migrate all legacy x86 applications to such new architecture, but it's inevitable...

I don't know if anyone realizes this but more than a dozen Katmai cores fit into a Prescott. And on .09 micron technology, they could be clocked at around 2 GHz (comparing with Coppermine and Tualatin). That's 96 GFLOPS using SSE! The new XBox will take a similar approach, with three cores at 3.5 GHz. Game programmers will have to learn SMP soon, and others will follow...
May 7, 2004 8:41:45 AM

>Either way, I never said the current market situation is
>totally abnormal. I just see a small trend

Very small :) 

>and if it
>continues like this I do believe there is reason to worry >about current technology.

That is a bit far fetched. There are more obvious explications, like competitive pressure from AMD, and the fact Intel indeed has problems scaling Netburst. You could maybe conclude Netburst (or more specifically, Prescott, since I believe a 90nm northwood would still have quite a bit of headroom) is hitting a wall, but I would not generalize this to conclude single threaded performance is hitting a brickwall. IBM and AMD seem to have no nig problems (yet ?) scaling up ST performance of their chips.

Sure, its getting more difficult, and mostly, more expensive, but the same doom scenerio has been predicted with each new process shrink, including 80 and 50µm shrinks in 1993. Pardon me for not holding my breath just yet.

>That's why I also believe that new architectures will soon
>be required, and to make more efficient use of the
>available transistor

What makes you think multicore is the only way to use those ever increasing transistor budgets ? God knows how many 486 cores you could fit on a 200mm² 130nm die, yet no one is doing it.

> There is no way such a gain could be achieved by doubling
> the transistor count of a single processor.

I disagree. Seeing something as "trivial" as SSE2, which accounts for maybe 2% of the diesize can achieve comparable speedups in cherrypicked apps... I also think these kind of highly paralellizable (word ?), FPU intensive workloads( rendering, video encoding,..) will more and more be offloaded to the GPU anyway, which is far better at those kind of things.

>Anyway it's an illustration that programming technology and
>processor technology should cooperate to reach new
>performance levels. We've seen the same thing with SIMD
>technology (MMX and SSE). First they didn't offer any
>real-life speedup at all, because no programmer used it (or
>badly, like AoS) and it took years before consumers saw the
>benefits

There is one major difference: MMX/SSE code can be generated automatically by the compiler, the developper hardly has to do anything special. And if not, its a rather trivial change to your code, and its really not in the same league as creating efficient multithreaded code. And if you think MMX or SSE is a bad example because of the "limited" scope of apps that can benefit, then consider AMD64 which costs <5% of the diesize and can increase performance anywhere from 0 to 20% on average, with spikes up to 100%.

>and now some algorithms run up to four times faster with >it!

LOL, exactly. Using something that is hardly a few percent of the diesize (as opposed to doubling it) and which requires nearly no effort from the developper besides setting the -mfpmath=SSE or -mmmx switches of their compiler. I thought you where the one claiming single threaded performance scaling was dead ?

>I don't know if anyone realizes this but more than a dozen
>Katmai cores fit into a Prescott

Thats just not true. Acording to sandpile, katmai was a 128mm² part on 25nm. That would mean roughly 80mm² on .18, 50nm on .13 and 30nm on .09. That is ~2.5x smaller than prescotts core which is 112 nm, a big part of which is used for L2 cache (don't forget, Katmai had no ondie L2 !!). You could fit at most 3 Katmai cores on Prescott if you want to keep the cache, and we all know Prescott contains countless apparently unused or at least unexplained 'phantom' transistors. Compare it to Northwood, and you could maybe fit in 2 cores at most. Never "more than a dozen". How did you get that number ? Just transistor count ? if so, that is bogus obviously, since cache transistor are MUCH more dense, and much cheaper than logic transistors.

> Like I've said before this is the approach taken by VLIW
>processors like the Itanium, and it works.

Well, I'd argue the "it works" part, because IMHO, it doesnt work except for FP which has very little to do with this discussion (since FP is "easy" too achieve). For integer performance (and even for FP), if you look at performance/mm², performance/transistor it trails x86 by a rather huge margin. You do realize Madison 6M is more than 3x as big as a Xeon (Prestonia) on the same process, and has nearly <b>10x</b> as many transistors ? So, are you still sure its good idea to ditch all that reordering logic to save a couple of hundred thousand transistors ? And if you look at the floorplan of Madison 9M and even Montecito, you may notice the core only occupies ~15-20% of the die estate in size and a tiny fraction in transistor count, so even using the EPIC approach, for a cpu designed only for the server market, it seems intel thinks there are still better ways to spend transistor than double the number of cores..

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 7, 2004 2:59:21 PM

Quote:
Very small :) 

I must agree. But did you find that date yet in the table that showed a worse scenario than today?
Quote:
Sure, its getting more difficult, and mostly, more expensive, but the same doom scenerio has been predicted with each new process shrink, including 80 and 50µm shrinks in 1993. Pardon me for not holding my breath just yet.

Sure, I understand you think it will all just magically work out, again. Oxide layers can be six atoms, then three, then one, then sprinkled here and there... I'm sure you get my point. Current technology is hitting -phycical- limitations, not just manufacturing issues. I think it's a bit naïve to assume that if a CPU technology has continued to grow steadily for the past fifteen years, it will grow forever. There will undeniably still be progress, so you might be right we don't have to hold our breath yet, but when do you expect to hit the wall then? One year, five years, another fifteen year?
Quote:
What makes you think multicore is the only way to use those ever increasing transistor budgets ? God knows how many 486 cores you could fit on a 200mm² 130nm die, yet no one is doing it.

I believe so, because many algorithms are highly parallelizable. Name one application that would never run faster on multiple processors.

Noone is doing it, because up until now it was possible to use smaller processes and increase the number of transistors solely to increase the clock frequency. AMD's less radical approach shows that this is not the only way to increase performance. IPC matters, and soon they'll realize TLP matters as well. Intel has constantly strived to make unoptimized x86 code run faster, with a lot of succes. But they can't keep optimizing hardware till infinity, wasting more and more transistors, just to make sure no new effort has to be made by the programmer to make it run faster.

Processor performance is now up to a level where 'office' applications programmers don't have to worry about optimizations, but other programmers are -willing- to take advantage of new architectures.
Quote:
I disagree. Seeing something as "trivial" as SSE2, which accounts for maybe 2% of the diesize can achieve comparable speedups in cherrypicked apps...

Thanks for agreeing. :smile: SSE is part of the new architectural changes I'm takling about. They require little extra transistors, but a lot of effort from the programmer to take advantage of.
Quote:
MMX/SSE code can be generated automatically by the compiler, the developper hardly has to do anything special.

ROFLMAO. Give me a minute to get back onto my chair ok... Sorry, don't take it as an insult please.

Compilers, even the ones that claim support for vectorization, are lousy at using SIMD (without "the developper doing anything special"). It's not because these compilers have been badly written, it's just that SIMD technology, and the way Intel implemented it, isn't suited for automatic compilation. You have to redesign algorithms to make use of it. I can even show examples of C code where Microsoft's compiler produced faster code using the FPU, than Intel's compiler using SSE.
Quote:
And if not, its a rather trivial change to your code, and its really not in the same league as creating efficient multithreaded code.

Well it won't surprise you now that you're wrong again. To really make efficient use of it one has to write the assembly code manually. There are lots of tools to help, like intrinsics and inlining, but either way I consider that "special effort". And writing multi-threaded code really isn't -that- hard. I'm currently working on a camera surveillance system where several cameras are shown on one monitor, using motion detection to pick the most interesting ones to display. It's really 'natural' to use a thead per network connection, per detection routine and per image panel. Although in this case it's not really because of performance reasons, I'm just saying that multi-threading isn't harder than the MMX optimized detection routine. It's just on a different (higher) level. I'm very confident that it can be used relatively easily for optimization purposes when using multi-processor systems.
Quote:
Thats just not true. Acording to sandpile, katmai was a 128mm² part on 25nm. That would mean roughly 80mm² on .18, 50nm on .13 and 30nm on .09. That is ~2.5x smaller than prescotts core which is 112 nm, a big part of which is used for L2 cache (don't forget, Katmai had no ondie L2 !!).

The size of the process is indicated by the gate size. It can't be used directly to tell how many transistors fit onto it. Katmai has 9.5 million transistors, Prescott has 125 million (75 excluding the cache). In all fairness, it shows that Prescott is very wasteful with transistors for nearly the same instruction set, and with a simpler architecture and slightly lower clock it could have had many more execution units. Let's not forget the Katmai cores would have some redundant duplicated logic.
Quote:
...and we all know Prescott contains countless apparently unused or at least unexplained 'phantom' transistors.

And these phantom transistors are going to do what, double performance? If it's 64-bit support then it certainly won't, and if it's Full Scan technology then they could have omitted it when using a more predictable simple multi-core architecture.
Quote:
How did you get that number ? Just transistor count ? if so, that is bogus obviously, since cache transistor are MUCH more dense, and much cheaper than logic transistors.

40% of Prescott transistors (cache) is on 30% of its die space. This tells me cache fits on 75% of logic space. Well if that's your definition of "MUCH more dense" then I can still fit in nearly a dozen Katmai cores. And it would only take a tiny bit of extra space to add a serious L2 cache. But it wouldn't have to be huge because a thread can be halted a bit to fetch the data, while other threads continues to fill all pipelines. Anyway I'm not saying performance will be a dozen times higher, but still really very much worth it.

The last few articles at Ace's Hardware are closely related to this discussion: <A HREF="http://www.aceshardware.com/list.jsp?id=4" target="_new">CPU Architecture & Technology</A>. And if The Inquirer is correct then also Intel is looking at other architectures: <A HREF="http://www.theinquirer.net/?article=15768" target="_new">Intel’s Potomac team gets dissolved</A>.

Are you still sure that current single-processor technology is not in trouble?
May 7, 2004 7:57:02 PM

Well Intel isnt looking at 'new technologies' its still pased on an old core, the p6. It seems they have decided to concentrate efforts on thier p-m chips to bring them to desktops. i think its a good idea, the p-m line has alot of promise so hopefully they can pull that off.

This discussion has gone out into outer space somewhat from the orginal topic, but its ok lol.

I dont think the high end prices of cpus have any bearing on the market going up in price as a whole. There have always been high prices for new chips, its not new. If you want to see the condition of the market, look at the mid-low end segments. To me, they look about the same as its always been, there are no upturns in cost for the consumer in relation to past products.

As far as the future of cpus, I dont think anyone here can say where it will head. Although I must say, it seems amd at least will push for dual cores, so it would seem they are embracing multithreaded environment as a way of pumping up performance. Intel just seems to be in a transition phase with no real plan as of yet, still trying to figure itself out. We hear a bunch of rumours about intel right now, they jsut seem busy atm lol.

I dont think its a question of wether single or multi thread architecture is the answer, becuase in the end, if the software doesnt support it or take advantage, then its useless. Why would the normal consumer buy a multithreaded cpu if all they do is office tasks, maybe a little gaming, and using the itnernet? It wont speed up things, just allow them to have more windows open at once, but most people dont do 10 things at once. Its a supply and demand thing. When there is a need for smp, then it will come, but atm there is no such need in the mainstream.

The future will bring some rough times as you have said, its inescapable, the physical wall for normal transistor type cpu increases will be reached sooner or later. Thats why advances like carbon nanotubes and other discoveries will form a new architecture on a sub atomic scale.

The point is, evne wiht the end in sight, normal consumers wont be hurt by this for id say easy 15 years, maybe 10-15, but I really think the trickle down will be very slow. And by the time it does affect mainstream, then a new architecture will be ready, becuase you know everyone is aware of this, they arent blind to it, they are preparing
May 7, 2004 8:02:05 PM

Re: carbon nanotubes

I read somthing on that stuff, stonger than steel 100 times lighter interesting stuff forsure. are there future plans to use this stuff in cpu designs?

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.
May 7, 2004 8:18:59 PM

yeah, since they can be made much smaller then transistors today, and scientists have even gotten electrons to floow paths through the tubes to perform simple tasks. But the tubes are still not stable enough to be used on a large scale and tend to break down. Also, the process to create them is still a bi ton the pricey side, so till be a while.
May 7, 2004 8:28:11 PM

I don't know much about this stuff. I did not know if it was conductive do you know how conductive it is? As to breaking down I heard the halflife on this stuff is somthing like "infinity" very strong and stable. I read in japan they want to use this material to build a pyrimid that will hold multiple skyscrappers somthing like 700,000 people. Truly a break through product for the future. just inagine what this stuff could do for airplanes or rockets getting material into otterspace wieght is everything.

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.
May 7, 2004 10:55:56 PM

Thanks for your thoughts trooper11!

I only have one small remark:
Quote:
It wont speed up things, just allow them to have more windows open at once, but most people dont do 10 things at once.

This is a common misconception. There can be multiple execution threads even for one 'window', one application. Every set of independent operations can be executed in parallel.

It can be applied to games as well. Things that I can immediately identify as independent to execute are:

- Game logic
- Artificial intelligent (per entity)
- Physics calculations (per entity)
- Sound processing
- Visibility culling
- Graphics processing (transform, lighting, rasterization)
- Human interface
- Etc...

Back in the days of Quake I, the processor did all these tasks sequentially. Nowadays, sound processing is done on the soundcard using an EAX processor. And graphics processing is totally controlled by the GPU processor. Yes that's exactly the same as having multiple processors! The only difference is that these processors are specialized at a specific task. The GPU is even an extreme example of multi-threading itself since every possible operation in done in parallel. The newest graphics cards can process 16 pixels in parallel, while at the same time transforming and lighting several vertices, performing clipping, rasterization, etc.

But still, artificial intelligence and physics takes a lot of processing power and is executed sequentially on single-processor systems. With a bit of effort from game developers, it could all be split into separate threads. On a dual-processor system all these threads could theoretically be executed two times faster. There's some overhead for 'synchronizing' operations and sharing data, but there will be an clear overall speedup! With three cores like for the next XBox even more things can be executed in parallel.

And I've only touched the tip of the iceberg. The game logic probably does things that can be computed independentely as well. Even something as simple as a 4x4 matrix calculation could actually be split into 16 separate threads. That would be extreme, but clearly illustrates that even small algorithms can be parallelized. The same can be done with every loop that has independent iterations. Every iteration can be a separate thread and even within the iterations things could be independent. In fact superscalar processors already do this to a very little extend, because they can look 40 micro-instructions ahead for independent operations.

There's an another easy way to conceptualize threading. Every professional application has several thousand functions (including external libraries it uses). If every function was a thead and we had a lot of processors, we could just run all independent functions in parallel. I remember my computer architecture professor saying that most applications have around 20 independent functions in average at any given time, often more.

...

Did I say "small remark"? :wink:
May 7, 2004 11:27:30 PM

Whisper = c0d1f1ed

Sorry for any confusion... I'm schizo. :wink:
Actually I just forgot to change user name on my laptop.
May 8, 2004 12:51:19 AM

oh i know, of course almost any app can in theory be used in a multi thread environment if it performs more then one task at a time, but my point still stands, do mainstream users need it? Will they need it in 10 years? maybe, but i doubt it. Of course int he very high end there wil be a need earlier, but that is not the largest segment.

Like when will there ever be a need for multi threading in things liek word or internet explorer lol. of course oyu could use it, but no one NEEDS it.
May 8, 2004 1:11:51 AM

In 10 years we expect to have computers that are 30 X faster. Does that seem possible on a single core? Not today.
May 8, 2004 3:11:24 AM

yeah sure thats ten years. multi threading wont save the pcu either, itll just delay the inevitable , that is the phyisical parts hitting thier limit. its possible there could be yet another single core design using another process.

but like even today, most people that buy pcs, from like dell or hp, they dont even need a 3ghz p4, most get by with 2ghz just fine. Im just saying progress gets dragged down by the economics of it.
May 8, 2004 3:39:28 AM

Sure, we could have diamond chips, with multiple cores. We will have something. Speculation at this point is looking at multi-core, but who knows? It's the journey, not the destination that I like.
May 8, 2004 4:40:31 AM

Yeah I have to fully agree it has always been that way.

The only thing that changed is for a while the top cpu's were being released cheaper than what they used to be released at. P4 3.4 for $500 compared to PII450 for $1000. But stepping down 1 notch from top gave a huge savings then, just like now. Looking at the Pentium 4 C chips, the best Intel deals have stayed almost the same price but you keep getting a higher clocked chip. 2.4C was $175, 2.6C dropped to about that, 2.8C dropped to about that, and the 3.0 will soon drop to that. The place where it stops is near the top of what platforms can use, as those chips keep their value for a while after being discontinued.

Intel with the desperate attempt to match A64 in games put out the EE's and in affect raised their top consumer level chip back up in price from the $500 range to the $1000 range it was in the Pentium II days. Ugh, too rich and stupid for me.

Anyway, as you know, 2-4 steps down from the top chips have always saved a ton of money and been the better buys, which helps us do it yourselfer's put our money to better use than buying a Dell or OEM system. Buying the 3.4's now just doesn't really make sense. 2.8-3.2 is where it is at, both for A64 and P4.




ABIT IS7, P4 2.6C, 512MB Corsair TwinX PC3200LL, Radeon 9800 Pro, Santa Cruz, TruePower 430watt
May 8, 2004 12:44:58 PM

Quote:
Like when will there ever be a need for multi threading in things liek word or internet explorer lol. of course oyu could use it, but no one NEEDS it.

I need it. For faster code compilation. For faster multimedia encoding. For faster scientific simulation. (Others need it. For faster games.)

I agree many tasks don't need that much processing power, but some tasks can never be fast enough. Code will always get bigger, multimedia formats will always get bigger, and scientific simulations will always get bigger. The faster processors get, the more demanding the next generation of applications becomes.

Besides, this is an argument pro multi-core. Unoptimized applications like Word and Internet Explorer could just run a single thread on one core. Applications that do need the extra performance can be optimized to take advantage of the extra cores.

Going from Northwood to Prescott, the number of core transistors nearly tripled, while performance remained the same. That really makes you think why they need all these extra transistors for applications that don't even make use of it, doesn't it?
May 8, 2004 1:44:30 PM

>Going from Northwood to Prescott, the number of core
>transistors nearly tripled, while performance remained the
>same. That really makes you think why they need all these
>extra transistors for applications that don't even make use
>of it, doesn't it?

No it doesnt.. makes one wonder how bad they screwed up Prescott, or how many unused transistors are in there. Netburst is bust, Prescott a trainwreck, don't generalize too much on that.

For comparison, an Athlon FX uses only 105M transistors, while running native (64 bit) OS and apps it will perform roughly 50 to 100% faster than a Banias with 77M transistors, and still significantly faster as Dothan with 120+M ? transistors).

Or as another example, S939 Athlon 64s will have a significantly smaller transistor count as either the current ones, or even Banias, while offering better to much better performance.

Really, there are still other and sometimes better ways to extract performance besides multicore. Multicore doubles not only (core) transistor count, but (core) diesize and power consumption as well. This really isnt always a good trade off, especially if it often brings zero performance increase (single threaded performance).

Now don't get me wrong, I'm seeing the trend, and look forward to multicore just as well, but in addition to increased single threaded performance, not to replace it. If I believed in the wider-only approach, I'd put my money on Niagra (8 way multicore, but highly primitive "slow" cores), which I definately don't for anything but specialized apps or server workloads (and even there I wonder); I surely wouldnt want that near my desktop, might as well run a beowolf cluster of 386's :) 

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 8, 2004 3:10:18 PM

>But did you find that date yet in the table that showed a
>worse scenario than today?

How hard can it be ? Oh well, here you go:
Q2 1997
Pentium II 233: $637
Pentium II 266: $775
Pentium II 300: $1981

I'm sure that proved the end of innovation and the semi industry hitting a brick wall :) 

> I think it's a bit naïve to assume that if a CPU
>technology has continued to grow steadily for the past
>fifteen years, it will grow forever.

I would use "shrink" instead of "grow" in this context :p 
Anway, its not 15 years its roughly half a century. And during all this time, each year the end was near according to some. You did read that Amdahl's quote of mine, didnt you ? Well, since I am no Jehova witness, so I'll just see it when it happpens. Not any time soon though.

>Name one application that would never run faster on
>multiple processors.

Duke Nukem Forever ?
Leasure Suit Larry II ?
Heck, even "Quake 3 -r_smp 1"
:) 

Now can you name me one application that will never benefit from faster clocks, or higher single threaded IPC ?

> AMD's less radical approach shows that this is not the
>only way to increase performance. IPC matters, and soon
>they'll realize TLP matters as well.

"soon" ? LOL, K8 was designed from the ground up to support multicore, and I learned of their 2 way core plans over 6 years ago. P4 was supposed to (and may still for Xeon )go multicore, as IPF obviously, but 2 way CMT dothans are a contingency plan that has not started longer than a year ago at most and I assure you the venerable Pentium Pro core did not have provisions for it.

> I'm currently working on a camera surveillance system
>where several cameras are shown on one monitor, using
>motion detection to pick the most interesting ones to
>display. It's really 'natural' to use a thead per network
>connection

OF course it is. Now consider how you'd speed up the motion detection algorithm of just ONE camera using multi threading... Tell me it is as easy as implementing MMX or SSE.

> Katmai has 9.5 million transistors, Prescott has 125
>million (75 excluding the cache)

Katmai had 0 Kb level 2 cache. You should count at least the off die cache chips as well, unless you really think even a dozen Celeron 300 (not A) cores bundled in one chip would be faster at ANYTHING as a 3+ GHz Prescott.

Even more fair, compare a .13 512 Kb P3 tualatin to a .13 512kb northwood. 44M for tualatin, 55M for Northwood (according to sandpile). How do you think a 1.4 GHz tualatin faires against a 3 GHz northwood ? Both using the same process, same cache size, comparable transistor count.

>And these phantom transistors are going to do what, double
>performance?

My guess is those phantom transistos are there for some form of DMT, which might indeed double performance in some cases (if it worked, and wouldnt send power consumption through the roof). Just like dual core might (nearly) double them in some. But like I said, try and ignore Prescott, its a wreck as it is, we know that.

>40% of Prescott transistors (cache) is on 30% of its die
>space. This tells me cache fits on 75% of logic space. Well
>if that's your definition of "MUCH more dense" then I can
>still fit in nearly a dozen Katmai cores.

Prescott uses 23 mm^2 for its 1 MB L2 out of 109 mm², so 21%. If that indeed consitutes 40% of its transistor count (I don't have a reference here, go a link ?), it means its ~2.5x as dense, MUCH denser indeed. i think you got your math wrong here:

Density= transistor count/mm²
prescott is 125M transistors
cache density = 40%x125M/23mm² or 2.2M/mm²
Core density = 60%x125M/86mm²=0.9M/mm²
If you divide the exact numbers the ratio is nearly spot on 2.5.

>Are you still sure that current single-processor technology
>is not in trouble?

Yes, I just think intel is in temporarely problems with their extreme speedracer design. The writing has been on the wall for a long time though.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 8, 2004 3:59:10 PM

Quote:
And if The Inquirer is correct then also Intel is looking at other architectures: Intel’s Potomac team gets dissolved

Are you still sure that current single-processor technology is not in trouble?

You can't be too careful when quoting the Inquirer. :evil: 
<A HREF="http://www.theinquirer.net/?article=15780" target="_new">Intel Denies Potomac technology canned</A>

He that but looketh on a plate of ham and eggs to lust after it, hath already committed breakfast with it in his heart. -C.S. Lewis
May 8, 2004 5:14:29 PM

and you missed my point agian.

you are not a minastream average joe user that buys from dell are you? No your not. Id say at least 70% of sales go to average joe users just to ge tont he internet, listne to some music, type, maybe watch videos and maybe game a little. Now you tell me why they need multi core now? See thsi is where the industry tries to force users to upgrade by making them think they need somehting they dont.

Yes maybe you need it, i said that all ready, in the hihg end segment it would be needed, but not where im talking about. A person could be happy with a p4 3ghz or ahtlon 64 3000+ for many years and never have problems. For all they need ot od its more then enouhg power. besides that, evne if they needed soemthing more, there are other componentst hey would upgrade first before the cpu thats for sure.
May 9, 2004 11:28:16 PM

Quote:
How hard can it be ? Oh well, here you go:
Q2 1997
Pentium II 233: $637
Pentium II 266: $775
Pentium II 300: $1981

Nice try, but those first two are not even close to affordable. And it's not because at that time processors were overall more expensive. Just look at the prices for Pentium I at the same date. Half a year later they did become affordable. Or look one year later and again you see a nice price ramp, where the affordable CPUs are nowhere near the performance of the fastest. So still, I don't see anything close to today's situation.
Quote:
I would use "shrink" instead of "grow" in this context :p 
Anway, its not 15 years its roughly half a century. And during all this time, each year the end was near according to some. You did read that Amdahl's quote of mine, didnt you ? Well, since I am no Jehova witness, so I'll just see it when it happpens. Not any time soon though.

Well I don't want to make any predictions either. I just observe the strange prices, and the physical limits they're hitting with current technology. There are several ways to circumvent these limits using current technology, and one is multi-core.
Quote:
Now can you name me one application that will never benefit from faster clocks, or higher single threaded IPC ?

None of course. The real question is: Will applications benefit more from multi-core processors than from the clock increase / IPC increase made possible by doubling the transistor count? Looking at Willamette, it's clear that on the same process there was almost no performance increase from using longer pipelines and lower IPC. Most performance increase was from increased memory bandwidth. And looking at Prescott, now it's even getting hard to increase clock frequency on newer processes. So unless some physical miracle happens that will enable Prescott to reach the promised 6 GHz in a couple of months, what way would you go?
Quote:
Of course it is. Now consider how you'd speed up the motion detection algorithm of just ONE camera using multi threading... Tell me it is as easy as implementing MMX or SSE.

I'll tell you: it is. Split the image in two and process them separately to detect motion. It's so simple I could implement it in ten minutes. And yes, this is a perfect example of a pure 2x speedup on a dual-core processor. Oh and it doesn't have to stop with two threads and two cores...

You're right about Prescott cache density though. I must have pulled some numbers together too quickly. I know it's 40% of transistors since 1 MB x 6 transistors is 50 million.

Anyway, thanks for the learnful discussion. I hope you or anyone else learned something as well. I'd like to end it here because we could go on forever without leading anywhere. So let's just agree to agree on what we agree and agree to disagree on what we disagree. Feel free to answer all my unanswered questions though. :wink:
May 10, 2004 12:06:17 AM

Quote:
you are not a minastream average joe user that buys from dell are you?

Actually I have a Dell laptop. :wink:

The most important thing is that performance is needed, when it is needed. What I mean is that simple things like surfing the internet doesn't take much processor power at all, but that doesn't mean everybody is content with a Pentium 200 MHz. We have more powerful systems because some tasks can never be fast enough. Even my mother, a complete computer newbie, complains about the reaction time of some applications. To have a very responsive system, it has to be fast.

Besides, it would be a little simplistic to assume that 70% of computer users don't do anything that uses lots of processing power. They don't buy these relatively expensive machines solely for surfing the net one hour in the weekend. Every amateur has demanding software and they all want higher performance at a lower price. I remember when everybody said 1 GHz was more than anyone would ever need...
May 10, 2004 1:16:45 AM

so your saying everyone needs more then a 3ghz pc now? how is that ? sure there is always better, but oyu still dont get my point.

people need whatever these companies tell them they need. Your actaully saying that most average users need more thena 3ghz cpu? why not 2gb of ram too? and lets see 1 tb of hd space? lol sure its nice to have, but what do poeple really need? Id like to have it, but my parents certainly dont. Yeah they will complain when somehting hangs or something, but oyu know as well as i do that its not the cpus fault most of the time for hang ups like that. Unless they are editing video or something, I could see that being a factor, but seriosuly, I knwo alot of novice users and belive me,t hey are happy with thier pcs. My aunt and uncle have a 2ghz intel machine and they couldnt be happier, they edit photos form a digital camera,burn cds, use the internet. My mom and dad use evne lower powerd machines and they do things like cd burning, video transfer form vhs to dvd, and interent use. Of course its possible ot speed those things up, but they are happy wiht it, and they wont pay to get a new system becuase it cuts 30 seconds off an encode or writes cds faster.

You just want it for you lol, thats cool, so do I. But i couldnt recommend to my parents they invest in multicore cpus just becuase they could encode to dvds faster, come on.
May 10, 2004 5:21:20 AM

It's not so much what people need, as what they want. Faster computers will give more options.
For some the hook will be video+VOIP, for others TVOIP, and movie rentals OIP. Some want a computer to run household functions. Have we even begun to scratch the surface?
May 10, 2004 6:43:29 PM

>Nice try, but those first two are not even close to
>affordable.

Neither are the P4 3.4EE, the 3.2EE and the 3.4E is $475. I think the situation is rather similar, if anything, the price premium for the "ultra fasts" was even bigger back then, and overall prices dropped. you call $637 'not even close to affordable', but it was a perfectly normal price for a high end cpu back then. Sub $100 cpu's just didnt exist afair.

>The real question is: Will applications benefit more from
>multi-core processors than from the clock increase / IPC
>increase made possible by doubling the transistor count?

Some will, other won't. Simple really.

>Looking at Willamette, it's clear that on the same process
>there was almost no performance increase from using longer
>pipelines and lower IPC.

Who said longer pipelines a la netburst was the only way ?

>Looking at Willamette, it's clear that on the same process
>there was almost no performance increase from using longer
>pipelines and lower IPC.

Even for willamette, you'd have to be fair and admit that a 2 GHz willamette (even 2.2's have been made) was considerably faster than a 1 GHz P3.

>Most performance increase was from increased memory
>bandwidth

What makes you so sure a P3 would have benefitted as much from that much bandwith ? Surely you remember similar clocked (and similary performing) Athlons gained close to nothing from faster FSB's and DDR Ram, that wasnt until ~1.4 GHz that it really started to matter.

>And looking at Prescott, now it's even getting hard to
>increase clock frequency on newer processes

You keep looking at Prescott, and I keep telling you to ignore it. You can't take one screwed up design and conclude a trend out of that. You think a 90nm Northwood would have had the same issues ? I think not.

>So unless some physical miracle happens that will enable
>Prescott to reach the promised 6 GHz in a couple of months,
>what way would you go?

No one promised 6 GHz Prescotts, but since you're asking, I'd have scrapped prescott a long time ago, and shrunk NW instead. Increase the cache to 1MB, and I think you'd have a nice cpu that should easily reach 4+ GHz without being excessively hot. Either that, or licence the K8 design :) 

>I'll tell you: it is. Split the image in two and process
>them separately to detect motion.

LOL ! You're not multithreading your algorithm (which is what I asked), you're splitting up the workload ! It doesnt work that way either, if you can detect motion accurately enough using just half the image, why bother processing the entire image in the first place ? I could just as well claim I could speed up the algorithm by a factor 2 using just one core: just decrease the resolution (or rather number of pixels to be processed) by a factor 2, there done. See ? Easy as pie. Single core is twice as fast now. thing is, your "multithreaded" version won't behave the same as the original one, tracking motion from one part of the image to the other won't work for instance. try again, and honestly, tell me it would be easy to multithread the *algorithm*.

>So let's just agree to agree on what we agree and agree to
>disagree on what we disagree. Feel free to answer all my
>unanswered questions though.

Same here ;) 

= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 10, 2004 10:48:33 PM

Quote:
LOL !

LOL! He who laughs last...
Quote:
You're not multithreading your algorithm (which is what I asked), you're splitting up the workload !

I am multi-threading it. And yes I'm splitting up the workload over multiple threads (processed by multiple cores) to finish the job quicker. That is what you asked for.
Quote:
It doesnt work that way either, if you can detect motion accurately enough using just half the image, why bother processing the entire image in the first place ?

Like I just said, I would still process the whole image. But in less time.
Quote:
I could just as well claim I could speed up the algorithm by a factor 2 using just one core: just decrease the resolution (or rather number of pixels to be processed) by a factor 2, there done. See ? Easy as pie. Single core is twice as fast now.

Cheating like that still works on a multi-core processor as well.
Quote:
thing is, your "multithreaded" version won't behave the same as the original one, tracking motion from one part of the image to the other won't work for instance.

And what makes you think that? Tracking motion happens per frame. And my multi-threaded version still processes frame by frame.
Quote:
try again, and honestly, tell me it would be easy to multithread the *algorithm*.

It's as easy as 1 + 1 = 2. Honestly.

Look, all the required operations are done in loops. Loops of which each iteration is independent. Therefore, it's easy to let one thread process the first half of the loop, and let the next thread process the second half of the loop. Only when both threads are done, results are combined and we can continue processing the next loop in parallel. There's actually a resemblance with SIMD here. Instead of processing sequential data in parallel, I would be processing data from a totally different location in parallel. And each thread can still use SIMD as well. The possibilities are really endless...
May 10, 2004 11:17:41 PM

Okay, without having more precise information on the algorithm and its bottlenecks and dependancies, I can't usefully comment. From what you posted I understood you'd actually have one thread process half an image, and search for motion there, and another process the other half. As if you'd be detecting motion independantly on two seperate camera's which obviously isnt the same thing. If you don't see way, extrapolate to the extreme, and you'd have a thread per pixel, and you really can't determine motion with a 1 pixel image. But I may have misunderstood.

But consider game engines. One can say it should be possible to create threads for AI, physics, rendering. There shouldnt be too many dependencies there, I agree that ought to be doable, even though it will create an overhead meaning the same game most likely will run slower on single threaded cpu's. If the threads can independantly process significant ammounts of data, the overhead won't be that big. But if they can, it also means its likely at some point the AI will consume a lot of CPU power, the other moment its the physics. Which means using a dual core chip, very often, one thread (therefore core) will not be doing a whole lot, while the other is working full speed, meaning you're not getting anywhere near a 100% speedup. As the workloads per thread without needing synchronizing and data exchange between the threads decrease, the overhead grows, meaning overall performance will drop, especially on single cored systems (still by far the majority for the foreseeable future). See the problem ?

Now, a "better" solution would be if you could actually multithread the physics or AI engine themselves, but good luck doing that. The physics engine would be nothing but dependencies, and I assume the same would apply to the AI. For the rendering it should be quite feasable to chop the workloads in smaller independant chunks, but most of that is already done by the GPU (and in parellel) anyway, so not much to be gained there. Same for audioprocessing which is usually also being offloaded to a dedicated audiochip. So if you're seeing a <10% speedup with Quake3 running on a SMP machine, I can't say I'm really surprised. If anyone ever achieves >30-40% using SMP with a real game (using null renderer or something to take the GPU out of the equation), I would be quite impressed and surprised.


= The views stated herein are my personal views, and not necessarily the views of my wife. =
May 11, 2004 1:58:02 AM

Quote:
Okay, without having more precise information on the algorithm and its bottlenecks and dependancies, I can't usefully comment.

That's a much wiser answer than saying it isn't possible to use multi-threading here. Thank you.
Quote:
As if you'd be detecting motion independantly on two seperate camera's which obviously isnt the same thing. If you don't see way, extrapolate to the extreme, and you'd have a thread per pixel, and you really can't determine motion with a 1 pixel image.

The algorithm consists of several parts. One just determines the amount of motion. It simply subtracts the new frame from the previous one and adds all these (absolute) differences. Theoretically, this could use a 'processor' per pixel (I wouldn't be surprised if hardware motion detection had logic for this per pixel). The next step is to determine the centroid of the movement. This is done by multiplying every difference with the pixel's position. Again an extremely parallelizable operation. Then a rectangle is constructed which contains most of the motion. This is done by adding up the motion per line (horizontally and vertically), to locate the regions with ~80% of the movement (again horizontally and vertically). The intersection of these regions is the rectangle where movement is concentrated. Once more this is a lot of loops with repetitive operations. One thread per line would theoretically be possible.

Anyway, I hope you're convinced that two processors would make these calculations really close to two times faster with relatively simple multi-threading. I hope it also sounds a bit more believable now that nearly any algorithm can be multi-threaded efficiently.
Quote:
As the workloads per thread without needing synchronizing and data exchange between the threads decrease, the overhead grows, meaning overall performance will drop, especially on single cored systems (still by far the majority for the foreseeable future). See the problem ?

There's not much to worry about. First of all, threads, contrary to processes, are really efficient. They use the same memory space and everything else. In fact for the processor they are nothing more than multiple instruction pointers, and switching between them is nearly instantaneous. With Hyper-Threading the processor even decides where to read instructions from every clock cycle. Synchronisation means locking a thread with a mutex to wait till other data arrives. However, on a single-core processor, everything is processed sequentially and thus the next thread starts when the previous ends, anyway. If you're still worried about performance on single-core processors; I can assure you that it's easy to create two versions of the software. A simple command line option can be used to decide to use multi-threading or not.
Quote:
The physics engine would be nothing but dependencies, and I assume the same would apply to the AI.

The physics engine has to test for collision by checking many polygons-polygon intersections, and the ragdoll calculations use big matrices. Both can use some parallel processing power. I'm no expert at A.I. but I assume it uses graph algorithms and genetic algorithms that need to test a lot of situations. Two threads can do this faster than one. The general rule is that every bottleneck of the application is a loop. And most of the time it does very repetitive, independent operations. This is exaclty where treading can really help. So I wouldn't assume too quickly that it's full of dependencies. maybe out of these loops there are many dependencies, but the real processing power is needed inside of them.
Quote:
So if you're seeing a <10% speedup with Quake3 running on a SMP machine, I can't say I'm really surprised.

I'm not surprised either. Like I said before, and if I do recall correctly, only the A.I. code of Quake 3 is multi-threaded. And it certainly isn't using top technology for it. Besides I don't think Quake 3 was ever really very CPU limited. Anyway I'll stop using Prescott as an example for wasted transistors if you stop using Quake 3 as an example of multi-threading performance. :wink: Let's indeed hope that when multi-core processors finally arrive, developers will learn quickly how to use it optimally...
May 11, 2004 2:35:36 AM

Quote:
The algorithm consists of several parts. One just determines the amount of motion. It simply subtracts the new frame from the previous one and adds all these (absolute) differences. Theoretically, this could use a 'processor' per pixel (I wouldn't be surprised if hardware motion detection had logic for this per pixel). The next step is to determine the centroid of the movement. This is done by multiplying every difference with the pixel's position. Again an extremely parallelizable operation. Then a rectangle is constructed which contains most of the motion. This is done by adding up the motion per line (horizontally and vertically), to locate the regions with ~80% of the movement (again horizontally and vertically). The intersection of these regions is the rectangle where movement is concentrated. Once more this is a lot of loops with repetitive operations. One thread per line would theoretically be possible.

I think that splitting the calculations for a single frame is a bad idea. I think the overhead for syncing the data would take up enough processor time that it wouldn't be that much faster, or maybe even slower. I think the best applications for multi-threading are larger scale, like having a thread for each camera complete with its own motion detection, analysis, and tracking algorithms. They could all run independantly and write to disk as needed. Heheheh, who do you work for, I'd like to apply for a job.

He that but looketh on a plate of ham and eggs to lust after it, hath already committed breakfast with it in his heart. -C.S. Lewis
May 11, 2004 2:43:35 AM

Personaly I agree with what you are saying. Multiple cpu core offer much potential. But to make it reality would rquire enormous effort from software engineers just like Itainium would. This is why Itainium will fail overall. Too much work and people don't wish to buy all new software. But if dual multi core becomes the norm I could see software being written to take advantage of it. Just it would be slow and gradual at least that is how I see it. and dual core would never be 200% performance of single core unless running an out of realilty benchmark to make dual core look good.

Thats how I see it anyway.

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.
!