Sign in with
Sign up | Sign in
Your question

AMD AIMS FOR FOUR-CORE OPTERONS BY 2007

Last response: in CPUs
Share
a b à CPUs
November 16, 2005 3:56:56 AM

well its still 1 year away at least though, by the intel's 65 nm techology might have saved them, not by performance, but by lowering the production cost by lots and then the market will be flodded by like friggin 800 dollar "top end" intel PCs


edited for spelling
November 16, 2005 5:37:42 AM

I since they say the quad-core will be "similar" to the current Opterons and Athlon 64s, this is the K8L. I wonder if they plan to make any internal changes to the processor such as in the execution units? The changes the article mentions are external interface changes with bus upgrades and L3 cache. I suppose the current architecture is efficient enough.

AMD itself isn't that far behind in their own 65nm process. The planned M2 sockets appear to be 90nm and since Fab36 isn't fully operational for 65nm yet, its likely AMD will ship in volumes mid-H2 2006 to meet Conroe. By the time AMD's quad core is released in 2007 though Intel would have already switched to a 45nm process.
Related resources
November 16, 2005 6:04:54 AM

I suspect that "internal changes" are part of the reason Amd has not gone to 65 nanos already. The new fab was designed built and has already been certified for that gate length. While the talk is of fab36 staying with 90 nanos till the end of the year, it may pass that by another quarter or more.
A lot is dependant on presler. Some say that presler may be held up till mid 06, and that conroe may be held up, as a result. That would force Amd to hold back once again.
November 16, 2005 5:16:01 PM

Quote:
I since they say the quad-core will be "similar" to the current Opterons and Athlon 64s, this is the K8L. I wonder if they plan to make any internal changes to the processor such as in the execution units? The changes the article mentions are external interface changes with bus upgrades and L3 cache.


In deed, AMD has another quad-core processor for 2008. This processor will be an all new architecture with all the features anyone can dream off. I can bet you that it will be pin-compatible with the K8L since a new platform in such a short period (2007->2008) is not what many IT buyers are searching for.

The current quad core that it's supposed to be released in 2007 will have improvements such as HT 3.0, independent memory controllers, DDR3, new multimedia instructions, level 3 cache and extensions to the AMD64 instruction set.

If these are the features of K8L, don't know what K10 would look like so I'm clueless about this one. :?:

Here's another article about AMD's future quad core solutions:

http://www.eweek.com/article2/0,1895,1887287,00.asp
November 16, 2005 5:23:20 PM

The register has another article about AMD's upcoming quad core solutions. :wink:
November 17, 2005 1:50:07 AM

Quote:
Sun will have it's 8 core CPUs soon


That's right. Sun will be targeting Itanium in the high-end market, while AMD will be focused in the low to medium end markets with Opterons.

Once AMD release their quad core processor with shared L3 cache, AMD can also aim at the high end market leaving itanium in no where to compete (don't even mention Xeon). :wink:
November 17, 2005 4:22:04 AM

I wonder if anyone has plans to release a quad-core for Desktop? The quad-core Opteron is targeted at high-end 4-way and higher servers. Intel's quad-core Tigerton is similarly targeted. Intel also has the quad-core Woodcrest, Cloverton, which is for uni-processor workstations and 2-way servers set for Q4 2006. It couldn't be that hard to introduce Cloverton to desktop. How useful it'd be, I'm not quite sure but with the surve toward multi-tasking and multi-threading it would sure be interesting. I suppose the Intel 955EE will have to due for a 4-threaded processor for now. Too bad Conroe doesn't support HT for quad-threaded operation.
November 17, 2005 2:07:09 PM

Quote:
I wonder if anyone has plans to release a quad-core for Desktop? The quad-core Opteron is targeted at high-end 4-way and higher servers. Intel's quad-core Tigerton is similarly targeted. Intel also has the quad-core Woodcrest, Cloverton, which is for uni-processor workstations and 2-way servers set for Q4 2006. It couldn't be that hard to introduce Cloverton to desktop. How useful it'd be, I'm not quite sure but with the surve toward multi-tasking and multi-threading it would sure be interesting. I suppose the Intel 955EE will have to due for a 4-threaded processor for now. Too bad Conroe doesn't support HT for quad-threaded operation.
Quad cores will be coming in the next 2 or 3 years. Than 8 core. Yes for desktops. :mrgreen:
November 17, 2005 3:23:20 PM

The real question is will the average person actually benefit from anything more than dualcore?
November 17, 2005 3:29:43 PM

If the apps become multithreaded, yes.

OffTopic: This multicore stuff is slowing down architectural advancements which I'm not too pleased about.
November 17, 2005 3:38:35 PM

Quote:
If the apps become multithreaded, yes.
That's a really big if though. I don't know of many software developers that like the idea personally, and even less software houses that can afford those increased costs in development and maintanance.

The next big if is if the apps actually gain noticably from multithreading. After all, not all will achive ideal gains.

And then the next big if after that is if all of the different processes and threads will even consume 100% of the 4 cores, or if they actually would all fit (or mostly fit) into the resources of 2 cores. Distributing the computing across more cores doesn't mean that you actually use more total resources.

And then the next big if is if the timing code doesn't detract more from the performance than that gained by multithreading.

I'm sure there are other ifs after that too.

Quote:
OffTopic: This multicore stuff is slowing down architectural advancements which I'm not too pleased about.
Yeah, I agree. It's like dualcore/quadcore has become the new marketspeak buzzword and AMD and Intel are using multicore as an excuse to not bother fixing anything else. (Though I could be wrong. If anyone has any good articles...)
November 17, 2005 11:06:31 PM

It's not that Intel and AMD is using multicore as an excuse to not fix everything else, its just that anything that's left to fix isn't worth it terms of performance improvement.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=259...

"Sure, there have been other "little tricks" that have steadily improved performance, but nothing spectacular. The CPU engineers still have a few tricks upon their sleeves that can improve IPC somewhat, but are limited to those that do not increase leakage and dynamic power loss. The focus is no longer on IPC or Instruction Level Parallelism (ILP). It is on Thread Level Parallelism (TLP)."

Essentially, the superscalar technology has reached its limit in feasible performance benefits, which is why they are switching to multi-threaded applications instead.

The other route instead of multicore would be something like the Itanium. It's rather interesting that the article shows how much more efficient the Itanium architecture is compared to the current processors from AMD and Intel. Of course, the Itanium route hasn't caught on so the only route left is multicore.

In regards to processor resource utilization in quadcore processors, I would expect that the utilization would be poor especially in the beginning until they optimize code. I know people dislike Hyperthreading, mostly because it was associated with the Pentium 4 and Prescott specifically but I really think that HT support is critical. Its current implementation on Prescott is generally disappointing as it generally yields poor performance increases probably less than 10%. However, if implemented properly the results could be quite spectacular. The Itanium's themselves actually use a less advanced form of HT called Coarse Multi-threading yet can yield performance increases of 30%. With the proper architecture, HT can maximize a dual-cores resources and very acceptable performance in a 4-threaded environment. Just another reason why I'm disappointed that Conroe won't include HT.
November 17, 2005 11:24:54 PM

Just when your posts were starting to show a glimmer of thought.
1- HT was introed on the P4c. On that platform it's effectiveness was closer to 20%.
2- On prescotts you are better of dissabling HT because the performance gains are not worth the extra heat.
3- Itanic is a huge dog, bothe in size and desktop usefullness, dont even think about going there.
4 The big one. If 4 cores aren't being used, what possible good would adding virtual cores do, aside from maybe adding heat, and so make all the cores throttle.
You can think, so try to do it before you post. Always remember, this is hostile territory to fanboys.
November 18, 2005 12:02:56 AM

No need to get all flustered and start using insults as arguments.

For the Itanium, I was just pointing out how much more efficient its architecture is. This isn't a lie or marketing.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=259...

"But compared to the best x86 design today - the AMD Opteron -, the Itanium does about 60% more work per clock cycle in integer, and about 115% more work per cycle in floating point."

How much this actually translates into real world performance is of course another matter, but I'm just using this to comment on the "slow down" of architectural enhancement due to multicore. With the current superscalar architecture unlikely to yield further major performance improvement, the options are toward an architecture similar to Itanium or to go multicore. I'm not trying to say "Go Itanium", I'm just pointing out that that was an alternative which has clearly not caught on so both AMD and Intel are concentrating on multicores and multi-threading for the future.

In regards to your Item 4, I'm not even sure what you're complaining about. First you say 4 cores aren't being used, then you say what are the benefits of virtual cores. The answer is already evident from that sentence. Yes, in general 4 cores would never be fully utilized and therefore are unnecessary. However, HT will allow a dual-core processor to achieve performance benefits in the event that a 4-threaded application is used. This benefit would come cheaply as you only have 2 physical cores instead of 4. If only the program is only dual-threaded then it'll run fine on a dual-core processor. Clearly HT will do some good as it enables support for 4-threaded applications while only using 2 cores.

Besides, I'm not talking HT for the Pentium 4 architecture, I'm talking HT for a Conroe derivative. This will only increase efficiency and value as it ensures that the existing dual-core resources are maximized. As well, adding HT will not affect production costs. The additional resources for HT only required 5% die space on the 90nm process. If HT were to be introduced, it would be on Conroe's successor which is on a 45nm process. This means the die area that HT requires is insignificant. As well, adding HT support to a Conroe derivative should not cause any heat problem. Conroe on 65nm is already designed to run significantly cooler than Prescott. When Conroe transitions to 45nm with leakage problems solved, heat production will drop further allowing HT to be added without creating unreasonable temperatures. With the lower temperatures throttling is not a concern. As well, with the addition of sleeping transistors to the 65nm process, HT can easily be disabled when not 3 or 4 threaded operation is not in use to save power and heat. HT would be a valuable addition to a Conroe derivative and will offer the benefits of 4-threaded computing, if required, without increasing cost whether in cost, power or heat.
November 18, 2005 1:33:23 AM

I said it once and I'll say it again:

Hyper Threading (some call it HyperThreadshit) is only a patch to cover a flaw in a flawed architecture like netburst.

Current Intel processor needs HT because of it's ridiculous long pipeline which traduce in lesser work per clock cycle.

AMD, in the other hand doesn't needs and doesn't benefits from HT becuase the pipelines are short enough to execute more work per clock cycle (sorry for the redundancy). The same goes for the Pentium M or Yonah (or whatever you call it) since these processors are design to execute more IPC than current P4s. Including HT on these processors will cripple performance since the pipelines are busy the whole time (that's not the case with Netburst since its pipelines are idle during most of the time).

That's why a dual core Athlon 64 4800+ can beat an Intel 840EE in almost all benchmarks that are multithreaded and even in multitasking (and please don't post Tommy's f*cked up "stressed test" here as a reference because I've lost all my fate on them since the last review of the Athlon Thunderbird processor). :wink:

In conclusion, i don't believe in HyperThreadshit, I do believe in DSMT (Dynamic Symetric Multi Threading).

If you want links or reviews, just say the magic word. 8)
November 18, 2005 2:01:09 AM

I know that HT was enabled on Prescott to make up for its longer pipeline. However, I just don't think that its association with the Pentium 4 should automatically taint its usefullness.

I'm actually not sure how much HT is actually associated with longer pipelines. The 31 pipe Prescott needed HT, but its interesting to note that the Itanium has a form of HT as well. The Itanium can achieve up to a 30% performance boost in threaded applications from CMT. CMT isn't even as advanced as HT. What's most critical is that Itanium can achieve that 30% performance with its 8 step pipeline. If CMT runs fine on an 8-step Itanium, it should be able to perform well on the 14-step Conroe.

I have to agree though that the results of the 840EE are disappointing, but I don't think that has anything to do with HT. Its mainly due to the fact that it is limited by its FSB bandwidth. While a single-core before had 800MHz to play with, their dual cores must make do with the same bandwidth. Smithfield just can't get enough bandwidth when both cores are running at full tilt. The other reason why the 840EE performs so poorly is because of its low clock speed of 3.2GHz. Single core AMDs at 2.4GHz are marketed to compete with Intels at 4000+GHz. The 3.2GHz is just too slow.

Luckly, the 955X looks very promising. The expansion to a 1066MHz FSB should releave must bandwidth issues as both cores don't run at full tilt most of the time anyways. The increase in clock speed to 3.46GHz and the extra 1MB L2 cache are also critical. With the 3.46GHz Dempsey highly competitive with the 2.4GHz Opteron 280, it is reasonable to assume that the 3.46GHz 955X will be likewise highly competitive with the 2.4GHz X2 4800+.

The 955X is set for launch on January 15. The rest of the Presler family is set to launch on New Years Day.

http://www.theinquirer.net/?article=27756

Of course, AMD has already prepared a response to the 955X. However, its interesting to note that this time it will be AMD that is overcharging for its high-end chips.

"With the price set at $999, Intel is heavily under pricing AMD's FX-60 and 5000+, which will sell for a daunting 20% more."

http://www.theinquirer.net/?article=27743
November 18, 2005 2:07:14 AM

Plus 64 bit is not out in full swing yet!!!! :mrgreen:
November 18, 2005 4:07:23 AM

Quote:
No need to get all flustered and start using insults as arguments.

Flustered, I think not. Insults, only to my inteligence, coming from your post.
Quote:
This isn't a lie or marketing

See, wrong again. Itanic's "efficiency" comes from it's huge die size, and it's lack of "legacy" support.
Quote:
With the current superscalar architecture unlikely to yield further major performance improvement

Why?, because Intel ran into a problem? From what I have seen, the A64s still seem to be capable of higher speeds.
About HT. You still dont understand it do you. HT was brought out for the P4c, or northwood C. It was a trick, to get the windows scaler better utilize system cycles.
The prescott, with it's longer pipes just dont have the spare cycles. On to of that, HT is a load, so that the scotties run hotter, when HT is enabled.
Anyone who knows how to use windodws scheduler can get as much of a benefit without HT, on a prescott.
If M$ were to design a better scheduler, HT would be a total waste (cross your fingers for vista).
I see your thinking about a good way to use HT. $ cores are useless, so just use two cores with HT, cause HT is already useless.
The one thing I found informative about the THG stress test, using the 840 with HT, was that primary tasks were slower, while tertiary and lower priorities were faster. To my way of thinking, that is not a benefit, that is a flaw.
November 18, 2005 4:58:18 PM

Quote:
1- HT was introed on the P4c. On that platform it's effectiveness was closer to 20%.


Dude, I hate to say it, but you're wrong.

HT was actually present in the Northwood B... I'm not sure about the A, but it was definately in the B. Remember the 3.06GHz Northwood B? That had HT, and it was enabled.
November 18, 2005 6:16:53 PM

Quote:


Luckly, the 955X looks very promising. The expansion to a 1066MHz FSB should releave must bandwidth issues as both cores don't run at full tilt most of the time anyways. The increase in clock speed to 3.46GHz and the extra 1MB L2 cache are also critical. With the 3.46GHz Dempsey highly competitive with the 2.4GHz Opteron 280, it is reasonable to assume that the 3.46GHz 955X will be likewise highly competitive with the 2.4GHz X2 4800+.



I disagree. Remember those benchmarks with a P4 3.46GHz with a 800MHz FSB vs the 3.7Ghz 1066GHz FSB? The 266Mhz extra bus speed made NO difference at all, in fact, the P4 3.46Ghz out scored it in most benchmarks. Sure, Intel can add all the L2 cache they want, that only means more latency. As I have said, Intel just needs a whole new archetecture for them to stand a chance against AMD.
November 18, 2005 11:31:57 PM

Warning: Another long post. lol

Quote:
Itanic's "efficiency" comes from it's huge die size


It's interesting that you point out Itanium large die size. If you were to read the entire article from AnandTech that I posted earlier, they actually talk about this issue. While the die size might be large, 432mm^2, the majority of that is just the L3 cache. If you were to look at the core size, that is the transistors that actually process data and the L1 cache, it is actually relatively small, only 80mm^2. 15mm^2 of that is for x86 compatibility. The 130nm Opteron is 190mm^2 of which more than half of it is cache. So essentially, the fundamental core size of the processing units is the same. Only the extra cache makes the Itanium larger.

Now you'll point out that regardless of comparable core sizes, larger overall die size is still larger. This of course leads to concerns on heat. On this issue Anandtech notes:

"It is clear that the Itanium core has a big advantage in the area of threading and power dissipation constraints. If you are not convinced, the dual core Itanium Montecito (90 nm process) has no less than 1.72 billion transistors, but it is still able to consume less than 130 W. Compare this with the 300 million transistor Power 5+, which consumes about 170 W on a 90 nm SOI process."

The other issue of increased die size and the need for more cache in Itanium is of course cost. Anandtech notes:

"Time is on the side of the Itanium. As new process technology was introduced, cache sizes have been growing very quickly during the past years, without introducing extra cost or high latency. No competitor has the advantages that Itanium has:
1. As caches get bigger, Itanium benefits more than the x86 competition. X86 CPUs target higher clock speeds and, as such, it is more difficult to use large low latency caches.
2. Intel has mastered as no other the skill to produce very dense and fast cache structures. "

Quote:
From what I have seen, the A64s still seem to be capable of higher speeds.


The point is that clock speeds will only go so far in increasing performance. It's doubtful that the A64 will be able to reach 4GHz clock speeds in a production environment. Even with speeds of 4GHz, the performance increase would not be linear so its unlikely you would even receive a 20% increase in performance. As well, the power comsumption and temperature levels would be unacceptably high now that the focus is on lowering them regardless of whether a 65nm process is used. This would mean that we cannot expect large performance increases from clock speed increases.

Other than clock speeds, there just aren't that many other options for the superscalar architecture to produce large performance increases. Multiple integrated memory controllers don't offer much benefit to single-processors as they just aren't bandwidth limited especially when DDR2 is introduced. Adding additional SSE or AMD64 extensions would help but the performance benefit from extensions are generally minor. Similarly prefetch, branch and loop detection routines are just about as efficient as they are going to get. The branch prediction on the Pentium 4 is 97% accurate so a 1% or 2% increase won't show major benefits.

That would leave drastic measures for the superscalar architecture to continue. For instance, you could add more execution units. However, generally the x86 architecture already has more than enough. Including more co-processors or cell processors is nice, but that would increase die size and generate more heat so they wouldn't be very efficient from a cost or power consumption perspective.

I'm hoping that I'm not being misunderstood. I'm not saying these points to support Itanium. I'm saying these points to explain why multicore and multi-threading are necessary since performance increases in a single-core plain superscalar environment are now hard to achieve.

Quote:
If M$ were to design a better scheduler, HT would be a total waste (cross your fingers for vista).


The disconnect that we have is that you are talking about HT in the Pentium 4 so naturally the perspective is coloured by the negative impression of Prescott. However, what I'm talking about is HT in Conroe's sucessor which is on a 45nm process. Whether the Prescott runs hotter with HT enabled is irrelevent. Conroe on 65nm already looks cool enough, while its successor on 45nm with leakage solved will be even cooler. In such a case there is plenty of temperature room for HT. Yes, it will be hotter but the additional heat won't matter since the original temperature would be low to begin with.

Now about the scheduler. While an inproved scheduler will increase performance this isn't the same as Hyperthreading. Even with better scheduling the processor can only process 1 thread at a time. What HT allows is 2 threads which don't need the same execution units to be processed at the same time. Even if Vista has a better scheduler, it would not make HT obsolete. In fact, with a better scheduler the threads that HT receives would be further sorted and organized yielding even greater HT benefits.

The fact that the 840EE performs primary tasks slower while increasing the speed of tertiary priorities is mostly a scheduler problem. The scheduler occasionally allows two threads through that need some of the same execution units. This would cause slow downs in the primary tasks. A better scheduler like what Vista may have would be smarter and pair the tertiary task with something that doesn't have execution unit conflicts ensuring the primary thread achieves full performance.

Its quite possible that HT will actually run better on a Conroe-derivative than on Prescott. Both the Pentium 4 and K8 are 3-instruction issue design. However, Conroe will be a 4-issue design. Now, people argue that x86 architecture rarely saturates a 3-issue design and so Conroe will not benefit from having a wider issue rate. While this may be true for a standard x86 processor, it makes Conroe ideal for HT. If most of the time only 2-issues are used, this means that with HT enabled you would get a fully parallel issue rate with 2 x 2 issues through a 4 issue design.

A wider issue rate, 6-wide, is why CMT (a less advanced version of HT) is so effective on Itanium, with a 30% performance benefit. Conroe with HT could see similar performance increases since the EPIC architecture generally issues more intructions at a time than x86. Probably something like 4 issues through a 6-wide design in Itanium while x86 generally issues around 2 instructions. A 4-wide design is very sufficient for HT in an x86 environment.

Now a wider design also requires more execution units to process those intructions. In this regard Conroe also looks promising. The Pentium 4 had 2 FPUs but they were both specific in their functions. One handled floating-point and SSE addition, subtraction, multiplication, and division, as well as MMX. The other only handled floating-point and SSE moves and stores. Conroe looks to have 2 or 3 full FPUs, each of which can perform all these functions.

The Pentium 4 also had 1 slow ALU to handle complex calculations like shift and rotate, and 2 fast ALUs. While the fast ALUs operate at twice the clock speeds, they were limited in what they could processor. 1 could do addition, subtraction, logical operations, evaluate branch conditionals, and execute store-data ops. The other fast ALU was even more limited and could only do addition and subraction. Conroe looks to have at least 3 full ALUs each of which could do all these functions.

Even though Conroe may have similar numbers of execution units to the Pentium 4, each of Conroe's execution units can process the complete range of instructions. This is critical to HT as it allows a wider variety of threads to be paired together as worries about whether the available executions units can do the operation required. While it won't offer the same performance as 2 cores, the probability of 2 threads being executed at the same time through HT increases with operation constraints limited and is now only constrained by execution unit number. In general the execution unit number should be fine as most instructions don't use that many anyways.

The only other concern is that HT needs to be associated with a long pipeline. That doesn't appear to be correct. If CMT can work fine on the 8-step pipeline of Itanium, HT should work fine on the 14-step pipeline of Conroe. A wider-issue rate and availability of execution units are far more important than pipeline length.

Overall, I still think HT is an ideal addition to a 45nm Conroe successor. The wider-issue rate, and numbers of full-function execution units ensures HT performance, while the 45nm process and the Conroe architecture design itself ensures that heat and power consumption concerns are mitigated.
November 18, 2005 11:49:31 PM

Quote:
Remember those benchmarks with a P4 3.46GHz with a 800MHz FSB vs the 3.7Ghz 1066GHz FSB? The 266Mhz extra bus speed made NO difference at all, in fact, the P4 3.46Ghz out scored it in most benchmarks.


You're right about those benchmarks and that's why I'm extremely happy. "The 266MHz extra bus speed made NO difference at all" in single core processors. That is key. This means that Prescott does not saturate the 800MHz bus. Its bus requirements are probably closer to a 667MHz bus when under full load. This means that dual core Smithfields were limited by their 800MHz FSBs. If both cores were under full load, each would receive an equivalent of a 400MHz FSB, assuming equal division, which is clearly not enough. The great thing about the 955EE is that it has a 1066MHz FSB. That means that under full load each processor will have 533MHz of bandwidth, assuming equal bandwidth. That goes a long way toward ensuring that the processor operates to its full potential.

Now you may say that 533MHz is still less than the 667MHz that I mentioned a single-core Prescott would need. While this is true, it really isn't a major concern. Both cores would rarely be under full load at the same time. In an uneven distribution based on load which is more common, the core under full load could receive 667MHz of bandwidth while the one not under full load could receive the remaining 399MHz. While playing games 1 core may receive all of the 1066MHz. The extra 1MB of cache also helps. While latency may have increased, there is still a net small benefit in performance and reduction in FSB load.

The 1066MHz FSB and the increase in clock speed both go a long way to ensure that the 955EE achieves its maximum potential. That's way both The Inquirer and I are enthusiastic about the 955EEs potential. Certainly the X2 4800+ will have good competition, and the 955EE will make the FX-60 earn its 20% increase in price.

"Because the 955 certainly packs a punch of power. "

" With the price set at $999, Intel is heavily under pricing AMD's FX-60 and 5000+, which will sell for a daunting 20% more."

http://www.theinquirer.net/?article=27743

[/b]
November 19, 2005 12:45:08 AM

Good job of scirting the issues I put forward.
Compare the 130 nano Itanic to the opteron, It has a huge die size.
If the cache is negated on bothe chips, the itanic is huge.
1 itanic has more transistors than a couple of dual core opterons. That's huge.
You may think 130 watts is aceptable for a chip, I just dont.
You nicely avoided the issue of legacy support, which is where a lot of itanic's advantage comes from.
How is it that conroe can add execution units, while the K series cant?
Who says a K8 on 4.5 nanos cant scale past 4 ghz?
While straight clockspeed would only give a perf boost of 35% @ 4ghz, more band width at that speed would be needed, so a 400/800mhz mem bus seems a good bet. That would help perf scale more linearly.
You dont understand HT, or schedulers, so come back when you do.
I dont care if you get your Intel line @ TGH or Anand, BS is BS, and yes the capitals do mean it's major.
Ask me if I think conroe has promise. I will tell you staight up, yes, but Intel's sop means that we wont see it untill more than a year after release. Intel likes to keep a bunch of cards in thier hands, for latter.
November 19, 2005 2:20:27 AM

Quote:
If the cache is negated on bothe chips, the itanic is huge.
1 itanic has more transistors than a couple of dual core opterons. That's huge.


Well, the Itanium's core has 20 million transistors, while the Opteron has 40 million. If you look at the pure logic core by subtracting the L1 cache then the Itanium now has 18 million transistors while the Opteron has 32 million. So you see, if the cache is negated the Itanium is not huge. In fact the 2 Opterons can fit in Itanium's core not the other way around. Now to worry though, inversing numbers is a rather common mistake when you're only concentrating on attacking something.

Quote:
You nicely avoided the issue of legacy support, which is where a lot of itanic's advantage comes from.


Exactly. This goes back to the various paths for the future of technology that I was trying to deal with. Maximum performance can be achieved through a complete break with current technology. Now I'm not saying that Itanium achieves maximum performance, I'm just saying thats a possible efficient choice. I myself don't support Itanium, not because the technology isn't good, but because without good compatibility or some sort of transitional mechanism it isn't beneficial from a consumer perspective. Other options include continuing to push superscalar technology, and multi-threading which is the current path that has been choosen.

Quote:
How is it that conroe can add execution units, while the K series cant?


I'm not saying that the K series can't add more. In fact it doesn't need to as it already has 3 full FPUs and 3 full ALUs. I'm just saying that Conroe have at least 2 full FPUs and 3 full ALUs which is an improvement over the Pentium 4, and a large improvement over the Pentium M which only had 2 ALUs, 1 FPU and 1 vector unit.

Even though the K series can add more execution units, its unlikely that they will. At 6 execution units, (9 including the memory stores, etc.) the K8 already has more than enough for most circumstances. Adding more would just use up die space and increase heat for little benefit.

Quote:
You dont understand HT, or schedulers, so come back when you do.


"To the end user, it appears as if the processor is "running" more than one program at the same time, and indeed, there actually are multiple programs loaded into memory. But the CPU can execute only one of these programs at a time. The OS maintains the illusion of concurrency by rapidly switching between running programs at a fixed interval, called a time slice. The time slice has to be small enough that the user doesn't notice any degradation in the usability and performance of the running programs, and it has to be large enough that each program has a sufficient amount of CPU time in which to get useful work done."

http://arstechnica.com/articles/paedia/cpu/hyperthreadi...

That is what I understand a scheduler to do, to decide which processor a thread is sent and to decide how much processing time a thread gets. A scheduler doesn't really order the threads as the processor executes out-of-order anyways.

"Hyper-Threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a Hyper-Threading equipped processor to pretend to be two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously."

http://en.wikipedia.org/wiki/Hyperthreading

That is an accurate summary of what HT is. The Arstechnica site above seems particulary good in its breakdown of SMP, SMT, and HT.

Based on those understandings, what I've said appears to be logical. Now if I or those websites am so incorrect, feel free to correct me. Where in my analysis of HT potential in Conre am I wrong? How can a OS based scheduler make up for hardware based HT support? I'm not unreasonable, you just need to be a little bit more descriptive than "come back when you do."

Quote:
I dont care if you get your Intel line @ TGH or Anand, BS is BS


Well, I use not only TGH and Anandtech, but also X-Bit Labs, Arstechnica, Digital Life, The Inquirer, The Register, and Game PC among others. But, if you view any site reporting the facts or drawing even neutral conclusions on Intel as BS and blasphemy then there's not much I can argue with.
November 19, 2005 4:36:51 AM

Quote:
Well, the Itanium's core has 20 million transistors, while the Opteron has 40 million.

Not even close. Guess again.
Quote:
Maximum performance can be achieved through a complete break with current technology

Yes, all current technology on an ongoing basis. Throw away everything, and buy more, and again and agian, if you want to keep that advantage. Great marketing! But it is what gives itanic it's kick. Sounds like jobs to me. (and I do mean Steve)
Quote:
the K8 already has more than enough for most circumstances. Adding more would just use up die space and increase heat for little benefit.

So adding dedicated SSE2 units wouldn't help in encoding?
When the A64s were brought out, that was the most common recommendation, but I'm sure you will tell us otherwise.
Quote:
"To the end user, it appears as if the processor is "running" more than one program at the same time, and indeed, there actually are multiple programs loaded into memory. But the CPU can execute only one of these programs at a time. The OS maintains the illusion of concurrency by rapidly switching
[red] and so on[/red]
So, you can use google. Not much of a start. What do you think you have to do to "understand" HT?
Quote:
But, if you view any site reporting the facts or drawing even neutral conclusions on Intel as BS and blasphemy then there's not much I can argue with.

If you are saying you can not tell the difference between a piece written by an unbiased reviewer, and the Amd/Intel marketing teams, there really is no point in talking to you.
Personnally, I believe that you are bright enough, aside from your prejudice.
November 20, 2005 1:07:10 AM

It's funny when people can talk wonders about a feature when it's proven that it doesn't works.

This proves what I've being saying about HT. :wink:
November 21, 2005 2:53:00 AM

Hey commander, are you still optimistic about Intel's offerings for next year?

I guess the The Inquirer is not. :wink:

Just a liitle quote:

Quote:
SOURCES WHO attended the Supercomputer show in Seattle last week were shown a number of boards from AMD behind the scene which indicate to us that 2006 may well be even tougher for Intel than 2005 on the server front.
November 21, 2005 11:01:57 PM

Quote:
Not even close. Guess again.


Well, the figures I got were from the chart in this page. Note that I am not comparing the total transistors per core which of course the Opteron would be smaller. I am comparing the transistors that actually process data not store it, or the "core" transistors. Further removing the L1 cache from the "core" would yield the "pure logic core". In both those cases the Itanium uses less transistors to actually process data.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=259...

I'm just going to trust Anandtech with the transistor count since I don't have time to look through AMD or Intel's technical documents. However, just looking at the Itanium die images you can already see that the majority of the transistors are due to the L3 cache, the L3 tags, the L2 cache, the L2 tags, and the bus logic. It isn't hard to see that the number of transistors on the Itanium actually processing data is small compared to the entire die.

Quote:
Great marketing! But it is what gives itanic it's kick.


That's exactly what I'm saying. I don't support it for its lack of compatibility, but its still interesting to see what potential is out there in other architecture paths.

Quote:
So adding dedicated SSE2 units wouldn't help in encoding?

AMD's K8 architecture is focused on general purpose execution cores which is why there aren't dedicated SSE2 units. Intel's P4 architecture on the other hand is more specific in that the execution cores are divided into simple (fast) and complex (slow). With Merom, Intel is moving in AMD's direction by incorporating general purpose execution cores. Certainly dedicated SSE2 units would help, but unless AMD wants to be scene to be moving toward Intel's old architecture, they would probably just increase the number of general FPUs which can do SSE2 calculations.

Quote:
So, you can use google. Not much of a start. What do you think you have to do to "understand" HT?


I've asked for your exalted knowledge on the subject but of course your only response has been "come back when you do." Well, if my own understanding is flawed, the websites I look at are BS, and I'm not allowed to Google, it becomes increasingly difficult for me to understand HT. [/quote]
November 21, 2005 11:22:27 PM

It's always been known that HT doesn't benefit some functions so this is nothing new. I personally do quite a bit of video encoding and HT is usually quite beneficial there.

In any case, this problem with HT is due to the Prescott-type architecture. What I'm thinking about is HT for a 45nm Conroe-derivative. Conroe itself will have 4MB of cache, which is double or quadruple what the processors in there test have. That will in itself help limit thrashing since more storage space is available. As well, since this is a shared architecture between 2 cores + 2 virtual cores, and there is more space available, it is more likely that what is needed by 1 core, is already in the cache from another core. This will also reduce thrashing. A major reason why thrashing causes such a significant reduction in performance is because the cache latency in Prescott is so high. This will not be a problem in Conroe, which uses the low latency cache design of Dothan and Yonah. Low latency means that even if thrashing occurs, which it doesn't in many cases, data can be sent back to the cache faster reducing the performance hit. The FSB will also be increased from the 800MHz in most Prescotts, to 1066MHz similarly reducing the performance hit. Conroe will also feature more advance prefetch routines to help.

As well, earlier endyen had concerns about the 840EE assigning lower performance to primary threads, while increasing performance of tertiary and lower threads. As it turns out, this isn't a flaw in HT itself. It is a flaw in the scheduler since it recognizes the 2 real cores and the 2 virtual cores the same. It then distributes to high demand tasks on the same physical core resulting in a performance decrease. However, if either your program manages the affinity of the threads or you yourself do it, you will be able to receive a performance boost by having HT enabled on a dual core system.

"To sum up, at the moment of the release of dual core CPUs with HT, the behaviour of muti-threaded applications will have to be carefully studied, regarding the problems of Windows XP scheduler to manage four logical CPUs with efficiency. If Microsoft does not update Windows XP scheduler in order to fix this (that is very unlikely, remember that Windows 2000 was never fixed to correctly handle HT), applications developers will (one more time) have to take that in charge in their application."

http://www.x86-secret.com/index.php?option=newsd&nid=84...

Windows Vista will likely fix this problem in its scheduler to make it fully compatible with 2 cores + HT. In the mean time application developers will have to pick up the slack. While this may seem tedious, it is really in their best interests since 2 cores + HT does give a performance boost that's worthwhile since its free.
November 21, 2005 11:46:03 PM

Yeah actually I am. I should first mention that I'm mostly referring to the single-processor workstation and 2-way markets. I have to agree that Intel's 4-way offerings pale decidedly against AMD's at least in the near future.

First of all, the article only vaguely mentions optimism on AMD's motherboards. These I presume are the new Socket F variety. These will probably be shipping with socket M2 sometime in May-June. In such a case, Intel will be able to remain competitive in Q1. As no compatibility problems with Dempsey has been mentioned, unlike the lower-end Preslers, it will likely ship in January with Yonah and Presler. Dempsey has been shown to be highly competitive with AMD's highest 2-way Opteron the 280. Granted AMD will probably release a speed bump, but Dempsey will still be competitive at least until Socket F arrives. That is most of the first half of 2006.

Now for Socket F. While it will definitely increase the Opteron's performance, I doubt it'll be drastic. Since Socket F is new, the initial processors will only be to test out the socket and introduce it to market. As such they will be produced in 90nm. This means they are essentially the same as AMD's current Opterons, meaning no multiple memory controllers or integrated PCIe controller. The major difference will be the bandwidth increase from DDR2 667 support. I have no doubt that this will push the Opteron decidedly better than Dempsey. Multiple integrated memory controllers and PCIe controllers won't likely come until K8L which is in 2007. PCIe controllers may not be until K10, since I haven't heard anything from nVidia about their new chipsets not having PCIe controllers in them.

However, it is important to note that while Socket F will be introduced in May and be faster than Dempsey, Intel will be releasing Woodcrest in H2 2006. This gives AMD only a few months of unchallenged time at the top. Of course, I'm not sure of Woodcrest performance but I'm pretty sure it will be competitive to the initial socket F Opterons. Past 2006 AMD will have the K8L, and Intel will have a 45nm shrink of Woodcrest. That far into the future its anyones guess who's better.

Generally, with Dempsey given a few months before Socket F, and Woodcrest coming on Socket F's heals, I really don't think AMD will be getting a free ride in 2006.

Interestingly, Merom originally missed its tape out by a month in July. However, its 64-bit motherboards were already stable at that time.

http://www.theinquirer.net/?article=24788

Now Merom is 1 month ahead of schedule and looks to be launched Q3 2006 like the July article predicted. It's going to have 4MB of L2 cache and 64-bit support.

http://theinquirer.net/?article=27812

Woodcrest probably isn't far behind since it was already mentioned in the Intel price lists from the article you posted.

This only makes me wonder if I should get a Yonah laptop, which itself is stable and ready to launch, or wait for Merom which is ahead of schedule and coming along nicely.
November 22, 2005 1:18:34 AM

Interesting perception of Itanic's core size.
First off, since the L2 cache, in the Itanics is more of an ALU than cache, it is usually included as part of the core.
Now there may be something wrong with my eyes, or that "scaled" image, but it sure looks like the core takes up more than 1/30th.
I was originally referring to monecito, which is expected to have a core (including bisc cache, but not L3) of 252 m transistors, while a single opteron core is generally concidered to have ~60m transistors (this includes trace and L1 cache)
Quote:
That's exactly what I'm saying. I don't support it for its lack of compatibility, but its still interesting to see what potential is out there in other architecture paths.

Youi are saying that having everyone change every part(including all software) of thier computer every two years is somehow a reasonable option? That is what it would take to keep a lck of legacy support a viable option. Now Bill may Like it, and Paul may say it's the future, but most people wont buy it.
Quote:
but unless AMD wants to be scene to be moving toward Intel's old architecture, they would probably just increase the number of general FPUs which can do SSE2 calculations.

No. Adding a dedicated SSE2 unit would effectively enable A64s to do encoding tasks as well as the P4s. If catching Intel at the only thing they still do well is "catching Intel's old architecture", let it be so.
Quote:
Well, if my own understanding is flawed, the websites I look at are BS, and I'm not allowed to Google, it becomes increasingly difficult for me to understand HT.

Try collecting data, and using a little scientific formula.
Statements like
Quote:
As well, earlier endyen had concerns about the 840EE assigning lower performance to primary threads, while increasing performance of tertiary and lower threads. As it turns out, this isn't a flaw in HT itself. It is a flaw in the scheduler
suggest you have a long way to go.
If you dont put any effort into it, the information wont be allowed to sink in. Ltes face it, you are an Intel fanbois. You would not accept anything I say, so the only way for you to understand what, why and where, is to do your own work. Here's a thought though, if HT worked on high IPC chips, dont you think Amd would have adopted it? After all, they already use SSE3, and that's mostly useless for them.
November 22, 2005 4:34:58 PM

Tsk tsk tsk... I said you were wrong, and no response?

I'm disappointed.

:cry: 
November 22, 2005 7:21:17 PM

What can I say, you were absolutely correct. The worst of it is that HT worked better on the P4b, though it was only used @ 3.06.
I dont know why I forgot about it.
!