<A HREF="http://www.xbitlabs.com/news/cpu/display/20031027151409.html" target="_new">http://www.xbitlabs.com/news/cpu/display/20031027151409.html</A>
----------------
<b><A HREF="http://geocities.com/spitfire_x86" target="_new">My Website</A></b>
<b><A HREF="http://geocities.com/spitfire_x86/myrig.html" target="_new">My Rig & 3DMark score</A></b>
ooo... shocking.
<font color=red><b>M</b></font color=red>ephistopheles
:\
Shocking if you've been ignoring all of the rumors for the past year or more. Nothing new at all if you haven't been ignoring them.
Personally I still think that these extensions are just an emulation of IA64. The P4 probably has enough out of order execution elements to handle emulating IA64's EPIC architecture if Intel just made the logic units 64-bit instead of 32-bit. (Much like AMD did to turn a K7 into a K8.)
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
So it would imply they could run off Windows Server 2003 or Windows IA64 since the 64-bit code is IA64 code?
Or would they have emulators to change it into x86?
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
| Quote : So it would imply they could run off Windows Server 2003 or Windows IA64 since the 64-bit code is IA64 code? |
Exactly. That would be the idea. Think about how many more software developers would be writing programs for IA64 that way.
Everyone thinks that if Intel releases a hybrid 32/64-bit P4 that it will weaken their Itanium market. But if that hybrid uses IA64 as its 64-bit extensions and that hybrid doesn't outperform an Itanium in FP (and what P4 could) then it'll actually strengthen the Itanium market by giving more incentive to port software to it without actually replacing the purpose of an Itanium.
It's a bit nuts considering the architectural differences between x86 and EPIC, but I think that with the extra caches, extended tables, etc. Scotty might just be able to pull it off. And if not Scotty...
So my wild conspiracy theory is that Yamhill hasn't come out yet not because Intel doesn't want to release a hybrid processor, but because Intel doesn't want to release a <i>weak</i> hybrid processor. So Intel has just been waiting for a P4 with enough on-die storage and out of order execution excellence to handle emulating EPIC.
Yeah. I know. I'm crazy.
| Quote : Or would they have emulators to change it into x86? |
That's the beauty of microcode. The P4 doesn't even directly execute x86 anymore. It's all translated into micro ops. So if IA64 was also translated into the same micro ops (with 64-bit extensions built into the micro-ops obviously) then the 'emulation' is really just a translation layer in the exact same way that x86 is already handled by the P4.
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
I don't buy this for a single second. IA64 is so radically different from x86, creating a hybrid cpu would be a daunting taks, if not nearly impossible. Just consider a few simple obvious things, like the 128 (?) 64 bit registers that the IA64 supports, the predication registers, etc. Implementing those in a netburst architecture would either be incredible slow, or more likely, pretty much impossible. Either way, the minor core revision that prescort is, leaves no room for such a miracle, it just aint gonna happen. If prescott has some 64 bit ISA hidden inside it, its either AMD64 or or a different x86-64 implementation, but not IA64, i am willing to bet my soul on that.
= The views stated herein are my personal views, and not necessarily the views of my wife. =
I'm willing to bet it's a dumbed down ia64. Eventually 32bit is going to die, especially once 64bit becomes prevelent. Once we get far enough out that there is no functional need for 32 bit backwards compatibility they will remove it... and its only logical to reuse what you already have as far as possible with itanium. So it will be compatible with ia64... I wont go quite as far as slvr, but I will say intel won't waste the opportunity to get more software for itanium (or more easily ported at worst case.)
Shadus
you mean a subset of IA64 ? It could be a possibility, though I still don't believe it. Even if it is, there will be no software whatsoever for it. A cpu that only supports a subset of IA64 won't run IA64 software, there will be no OS, no apps, nothing. It will take ages to get OS and application support.
Here is my guess: we will be seeing chipsets and motherboards that support either a x86 or a IA64 cpu. You buy a Xeon server, and when you feel the need for 64 bit, you simply replace the cpu with an Itanium. You will be able to run the "old" x86 software through software emulation until you can migrate to IA64 code. Either that, and/or a 64 bit extention to x86, but no IA64/x86 hybrid cpu, I'll really eat my hat and coat and shoes if that is ever going to happen within the next few years.
= The views stated herein are my personal views, and not necessarily the views of my wife. =
What is the scope of a subset?The subset means the processor is capable of some 64 bit instructions?Then why not software cannot support it.For all the 64 bit instructions which doesnt have native support in the processor the emulation will take care.So as the processor subset is essential instructions it will run with decent speed too
| Quote : I don't buy this for a single second. IA64 is so radically different from x86 |
That's my point and is the very statement that proves you aren't thinking about this very clearly. Look at the specs of a P4. <i>The P4</i> itself is radically different from a traditional pure x86.
The P4 is not however all that different from a stripped-down Itanium. Certainly more registers will need to be added and everything will have to be extended to handle IA64 code, but the heart of the P4 itself is already so <i>unlike</i> an x86 CPU that I think with the additions and extensions it <i>could</i> be made to execute IA64 instructions.
AMD made their additions and extensions to the K7 to make a K8 with an almost identical footprint. So how much die space <i>really</i> would need to be added? Why I bet that Intel could do it and no one would even know the difference so long as it was disabled. If AMD could and already did I don't see why Intel couldn't.
| Quote : Implementing those in a netburst architecture would either be incredible slow, or more likely, pretty much impossible. |
Right. Because it's not like Intel would be putting them onto a smaller die that by its very nature has more space to put things. It's not like AMD expanded their execution units to 64-bits and not only doubled the size but also the capacity of their GPRs without taking up any noticable amount of additional space <i>with the same lithography</i>. <sarcasm><i>Yeah. You're right. It's just "pretty much impossible".</i></sarcasm>
| Quote : Either way, the minor core revision that prescort is, leaves no room for such a miracle, it just aint gonna happen. |
I don't see how you figure. The architecture is already designed for extreme OoOE. Scotty is extending that capability even further by increasing several caches and tables. This supposed 'minor core revision' is just several steps closer to everything that a P4 would need to emulate EPIC through a microcode translation layer well enough to not be laughed at. No room for such a miracle? The 0.09 micron etching gives it <i>plenty</i> of ROOM for such a miracle.
| Quote : If prescott has some 64 bit ISA hidden inside it, its either AMD64 or or a different x86-64 implementation, but not IA64, i am willing to bet my soul on that. |
Cool. Care to sign that in blood? There's always a good market for souls.
Serously though, I'm not saying I have some insider info and I know it will be. I'm just saying that from both a technical and marketing standpoint it's not only possible but would be the most beneficial route to take.
Then again, since when do companies do things in the most intelligent manner?
That's probably the best argument that I can think of for it to not be true. Heh heh.
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
> The P4 itself is radically different from a traditional
>pure x86.
Not really. its just a highly tweaked/optimized x86 core.
>AMD made their additions and extensions to the K7 to make a
>K8 with an almost identical footprint
x86 versus x86-64 (AMD64) is an order of magnitude simpler than x86 -> IA64. I wouldnt know where to begin really.
> The architecture is already designed for extreme OoOE
Surely you realize the whole concept of EPIC is to avoid OoE ?
>Cool. Care to sign that in blood? There's always a good
>market for souls.
No, but I'm willing to take 100-1 bets. If there is a Paypal thing that allows this, I'm on.
>. Scotty is extending that capability even further by
>increasing several caches and tables
>This supposed 'minor core revision' is just several steps
>closer to everything that a P4 would need to emulate EPIC
>through a microcode translation layer well enough to not be >laughed at.
Caches and tables ? WTF are you talking about ? "Scotty" is to the P4C what Northwood is to Willamette. What you are suggesting here, is about as impossible as that scotty is able to execute Power4 binaries at speeds similar to IBM's implementations.
>Serously though, I'm not saying I have some insider info
>and I know it will be. I'm just saying that from both a
>technical and marketing standpoint it's not only possible
>but would be the most beneficial route to take.
Its technically just impossible. If it where, why do you think they designed the Itanium the way they did ? Why not create a P4 derivate to handle IA64 ?
100-1 bet, really, I'm on.
= The views stated herein are my personal views, and not necessarily the views of my wife. =
| Quote : :\
|
Yes, I know... I was just being ironic.
<font color=red><b>M</b></font color=red>ephistopheles
| Quote : You buy a Xeon server, and when you feel the need for 64 bit, you simply replace the cpu with an Itanium. |
What about a dual CPU mobo with 1 Itanium and 1 P4?
***evil grin***
I'm just dreaming a bit... But I don't really want to risk my soul or anything betting on what Intel has or hasn't implemented... I just hope it gets to us, the end users, quickly, runs fast as hell, and isn't overly expensive... (OK, that would be paradise, I know...)
<font color=red><b>M</b></font color=red>ephistopheles
should be possible, running 2 OS instances. I mean, there are already such machines (SGI I think) that contain both x86 and IA64 cpu's. You just can't use them both at once running the same OS/partition.
It really wouldn't be impossible having an Itanium and a P4 running on the same machine, each running their own OS and apps.
= The views stated herein are my personal views, and not necessarily the views of my wife. =
| Quote : It really wouldn't be impossible having an Itanium and a P4 running on the same machine, each running their own OS and apps. |
I was kinda thinking of a system in which you could complement each other's weaknesses... if they're running different apps, then that would be hard...
<font color=red><b>M</b></font color=red>ephistopheles
impossible at any point it will be much more simple to put bolt core in 1 that any others thing.In any case Intel wont see in decrease in there sale before at lease 2006
I dont like french test
around 2005
Unified FSB socket for IA32 and IA-64.
I dont like french test
allready done by everyone
They just using java as the main cluster can run all intesif task and they older or new XEON cluster run X86 apps.Probleme total cost increase as it simpler to have 1 software suit and 1 team of support but if the X86 stuff is allready in place there is not much more need.
I dont like french test
| Quote : > The P4 itself is radically different from a traditional
|
Are you kidding me? Only someone who doesn't have a clue what they're talking about could say something so silly.
| Quote : x86 versus x86-64 (AMD64) is an order of magnitude simpler than x86 -> IA64. I wouldnt know where to begin really. |
I never said that it wasn't. For AMD to do it with their implementation of x86 would be damned difficult. The P4 however is an order of magnitude closer to IA64 than pure x86 is.
And even then, only said that it <i>could</i> be possible and that <i>if</i> it is possible then it's likely only become feasable thanks to the increased room provided by 0.09 micron etching. It might not even be feasable until something even smaller. I don't know. But I'd bet everything that I own that at least one engineer at Intel has given it thought.
| Quote : Surely you realize the whole concept of EPIC is to avoid OoE ? |
Surely you realize that the whole concept of EPIC is <b>not</b> to avoid it at all but to instead take it to the next level. EPIC processes the logic for both sides of a possibility so that once the answer is known it's already part of the way past that comparison. It then just tosses the computations made for the unused side of the path aside. If processing both sides of a comparison before the result of the comparison is even known isn't executing code out of order then nothing is.
| Quote : No, but I'm willing to take 100-1 bets. If there is a Paypal thing that allows this, I'm on. |
I don't wager for money. That's too mundane to be interesting to me.
| Quote : Caches and tables ? WTF are you talking about ? "Scotty" is to the P4C what Northwood is to Willamette. |
You obviously have no idea what you're talking about. Go do some research and come back when you have a clue.
| Quote : What you are suggesting here, is about as impossible as that scotty is able to execute Power4 binaries at speeds similar to IBM's implementations. |
No, what I am suggesting here is that Scotty or Tejas might be able to execute IA64 binaries at speeds that are slower than an Itanium but roughly equivalent to x86. What's so ludicrous about that? If a P4 had the necessary registers, caches, tables, and all of the units had the required bit-size then why couldn't a P4 with its much faster clock be capable of emulating IA64 code to speeds that are roughly equivalent to x86-64 code running on an Opteron? If you wish to debate this any further then you will have to explain to me just exactly why that is technically impossible.
So far I'm giving technical reasons. So far you're giving generalized blanket statements. Step up or give up.
| Quote : Its technically just impossible. If it where, why do you think they designed the Itanium the way they did ? Why not create a P4 derivate to handle IA64 ? |
I've already said. Because until now the etching process hasn't allowed for enough to be fit into the die for a P4 to handle it. I shouldn't have to be repeating myself here. You really suck at this debating thing.
And I never even said that Scotty definately would be able to handle it. It could be Tejas. It could even be after that. But the technical direction that the P4 is going in is definately that road, so it would be silly for Intel to not at some point bridge the remaining gaps between the P4 and the Itanium into one new core if for no other reason than to compete with a 32/64-bit hybrid from AMD if the concept of a hybrid proves to be a valuable new market.
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
>Are you kidding me? Only someone who doesn't have a clue
>what they're talking about could say something so silly.
This attitude starts bothering me, honestly. You call me clueless over and over, while I keep proving you wrong over and over. I don't mind being called clueless by someone who knows more about this stuff than I do, Paul Demone may call me this any time of the day, but not you.
Once you stuff the attitude, I may get back to you and explain things liks: "If a P4 had the necessary registers, caches, tables, and all of the units had the required bit-size then why couldn't a P4 with its much faster clock be capable of emulating IA64 code to speeds that are roughly equivalent to x86-64 code running on an Opteron?".
= The views stated herein are my personal views, and not necessarily the views of my wife. =
| Quote : impossible at any point it will be much more simple to put bolt core in 1 that any others thing.In any case Intel wont see in decrease in there sale before at lease 2006 |
Yes it would be much easier to just put a P4 core and an Itanium core both into the same die and whenever the Itanium recieves 32-bit x86 commands to just route them to the P4 part of the core for computation instead of using Itanium's own slow x86 emulation.
The flaw however is that the end product will just be a multi-cored die that's even larger than a single Itanium with only one Itanium in it. It won't compete with multi-cored dual Itaniums and it'll be a shite load more expensive than any Xeon. It couldn't even hope to compete in the same market as Opteron for price reasons alone and thus would be an almost completely useless concept.
So easy does not necessarily translate into effective.
Much harder but also much more effective would be for Intel to do what I've said. Production costs would be equivalent to Xeons. Performance wouldn't be against an Itanium, but against an Opteron. The Itanium market would remain untouched and Intel would have a product capable of competing in the same market as an Opteron if not an Athlon64.
| Quote : In any case Intel wont see in decrease in there sale before at lease 2006 |
Exactly. Which is why I keep saying that it doesn't even need to be in Scotty. It could be in Tejas. It could even be later than that. Intel still has plenty of time to work out how to make it possible before they have to release <i>any</i> type of hybrid to compete with AMD's hybrids. And the need to work it out still could be exactly why Intel <i>isn't</i> trying to compete with AMD's hybrids right this very second.
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
| Quote : This attitude starts bothering me, honestly. |
Tough. If the shoe fits, wear it.
| Quote : You call me clueless over and over, while I keep proving you wrong over and over. |
Proven me wrong over and over? So far you haven't <i>proven</i> a single thing! So far the <i>only</i> thing that I've been wrong about was that a coworker ran a single process over 2GB, and that was because <i>he</i> miscommunicated to me. <i>And</i> I graciously corrected myself on that once I got the correct facts from him, which I certainly didn't have to do by the way.
So far one of us has been debating intelligently and one of us has been twisting and avoiding everything. Give it up already. You're just not good at this. You just don't know as much as you pretend to. That's no crime. It just means that it's time for you to stop pretending like you have a clue when you clearly don't.
If you could actually prove that you do have a clue, even just once, then I wouldn't be calling you clueless. So go ahead and prove me wrong. I dare you to. And if you can I'll humbly rescind. Crow doesn't really taste so bad.
Otherwise however the attitude stands because it's justly deserved because I've really grown tired of your constant responses that have no value to them. Step up or shut up, bbaeyens. The ball is in your court.
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
| Quote : Surely you realize that the whole concept of EPIC is not to avoid it at all but to instead take it to the next level. EPIC processes the logic for both sides of a possibility so that once the answer is known it's already part of the way past that comparison. It then just tosses the computations made for the unused side of the path aside. If processing both sides of a comparison before the result of the comparison is even known isn't executing code out of order then nothing is. |
I think you're thinking of speculative execution, not OoOE. Speculative execution is when a branch is taken even though the result isn't known. IA-64 attempts to solve this problem by processing both branches and discarding the one that was incorrect.
OoOE is when you're executing multiple instructions that have a specific order and due to one reason or another, one of these instructions can't be executed. You can then execute an instruction that comes after that one while putting this one on "hold" and executing it later.
The whole point of IA-64 and EPIC in general is to make OoOE unneccessary. Compilers will analyze code beforehand and attempt to order it in such a way so that there will be no instances in which instructions are halted. This saves the processor from having to analyze instructions on-the-fly, costing die space, transistor cost, power, heat, all of which could be used for other things such as a wider execution engine.
Sure, you could put OoOE on an IA-64 chip.....but the whole point of IA-64 was to make it unneccessary. The actual core of Itanium and Itanium 2, even including the dedicated x86 processor that's latched onto there, is still smaller than the core of the P4 or Athlon.
| Quote : Its technically just impossible. If it where, why do you think they designed the Itanium the way they did ? Why not create a P4 derivate to handle IA64 ? |
With the 150+ million transistor budget and constantly growing for future chips (including Teja), it could be very feasible that the chip will have 2 front-ends. One to decode IA-64 instructions and another to decode x86 instructions.
As for how the instructions are processed in the back-end. Again, there's no reason you can't put OoOE logic on an IA-64 chip. It simply isn't neccessary and, in Intel's eye, just wastes die space. However, if you were to fuse an x86 processor with an IA-64 processor, you'd need that OoOE logic there anyway for processing x86 instructions, so why not have it there to give poorly written/compiled IA-64 code that extra edge? They could always improve their compilers and later on take away the x86 component in favor of software emulation. And in the mean time, they'll give programmers time to adjust and transistion to EPIC concepts.
"We are Microsoft, resistance is futile." - Bill Gates, 2015.
| Quote : I think you're thinking of speculative execution, not OoOE. Speculative execution is when a branch is taken even though the result isn't known. IA-64 attempts to solve this problem by processing both branches and discarding the one that was incorrect.
|
I'm thinking of both. My point is that the line between the hardware needed to them is very thin. It would not take much effort at all to turn the P4's OoOE hardware into speculative execution hardware, especially when the P4's clock speed can be double that of an Itanium, allowing it to split execution and cache the results using the OoOE tables in order to emulate 'true' speculative execution.
| Quote : The whole point of IA-64 and EPIC in general is to make OoOE unneccessary. |
I know. My point is that the OoOE hardware of the P4 can be used to emulate that. Just because the hardware was designed for OoOE doesn't mean that it <i>has</i> to be used specifically for OoOE.
You're right in that they are two different methods of processing data. However they are two methods which are so similar in many ways that they could be implemented using the exact same hardware if that hardware was flexible enough, which is exactly what Intel would have to do to make IA64 code run on a P4.
True, the P4's version wouldn't be 'true' speculative processing. It would be an emulation of it. However the means are unimportant if the end result is all that matters.
| Quote : The actual core of Itanium and Itanium 2, even including the dedicated x86 processor that's latched onto there, is still smaller than the core of the P4 or Athlon. |
You sure about that? I'd love to see your source of information. From what <i>I</i> can find on a pure 0.13 micron comparison basis the Itanium2 has a die size of 374mm2 while the P4 has a die size of 146mm2 and the TBredB has a die size of 84mm2.
It's hard to account for exactly how much of that 374mm2 is used by the L3 cache because of the strange shape of Itanium's core, but even then it's still an extreme stretch to pin Itanium's core without the L3 as being smaller than 146mm2. I don't even see how it could compare to the 101mm2 of a Barton or the 84mm2 of the TBredB.
| Quote : With the 150+ million transistor budget and constantly growing for future chips (including Teja), it could be very feasible that the chip will have 2 front-ends. One to decode IA-64 instructions and another to decode x86 instructions. |
Exactly.
That's the beauty of microcode. You could easily have one chip support two different front ends. In a way Itanium already does.
| Quote : As for how the instructions are processed in the back-end. Again, there's no reason you can't put OoOE logic on an IA-64 chip. It simply isn't neccessary and, in Intel's eye, just wastes die space. However, if you were to fuse an x86 processor with an IA-64 processor, you'd need that OoOE logic there anyway for processing x86 instructions, so why not have it there to give poorly written/compiled IA-64 code that extra edge? They could always improve their compilers and later on take away the x86 component in favor of software emulation. And in the mean time, they'll give programmers time to adjust and transistion to EPIC concepts. |
That's an interesting look at it. If a true fusion was made instead of a partial fusion completed with IA64 emulation then you'd have both EPIC and OoOE. I wonder if they'd step on each other's toes or if that'd improve bad IA64 code.
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
Silver, pardon the interruption, but...
| Quote :
|
I'm pretty paranoid when I say that this is almost as if you two secretly talked and decided to spoof what me and BB did, by putting BB as me and you as him!
Seriously, I know this is crazy, but damn, the situation is almost too accurate to be coincidental, especially that it was a few days ago that this fight between me and BB stopped.
That first quote by him is especially scary, as this is what I said at some point.
Tell me I'm nuts.
One more thing, what is microcode exactly? Why is it another method aside from lasering, to configure a chip's functions as you two were debating for the Opteron's active HT links?
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
<P ID="edit"><FONT SIZE=-1><EM>Edited by Eden on 10/30/03 03:16 PM.</EM></FONT></P>
| Quote : Tell me I'm nuts. |
Okay, you're nuts.
...
Seriously though, I'd call it a strange coincidence if you believe in coincidence. I'd call it karma if you believe in karma. Or maybe it just falls into the category of "it takes one to know one". **shrug** Beats me.
Anyhwo, none of this was scripted. Go ahead and try to PM me. It's been disabled for ages. **ROFL** And as anyone who tries to email me knows I hardly ever check that. (Sorry to anyone who has tried to email me.) So good luck trying to hold a conversation with me outside of what you see.
There's simply no way.
| Quote : One more thing, what is microcode exactly? Why is it another method aside from lasering, to configure a chip's functions as you two were debating for the Opteron's active HT links? |
I could have this wrong since I've never really studied deeply into this but as I understand it microcode is kind of one word to describe two things. In this case it's basically like a firmware layer (think BIOS but for the CPU instead of the mobo) stored on flash RAM built into the CPU that can contain minor bugfixes / workarounds or in this particular case possibly be used to disable certain features.
<pre><b><font color=red>I've always wondered why people liken the taste of blood to copper.
It tastes much more like iron to me.
<-
-
-
-
-
-></font color=red></b></pre><p>
>You sure about that? I'd love to see your source of
>information. From what I can find on a pure 0.13 micron
>comparison basis the Itanium2 has a die size of 374mm2
>while the P4 has a die size of 146mm2 and the TBredB has a
>die size of 84mm2.
Have a look here:<A HREF="http://www.scd.ucar.edu/dir/CAS2K3/CAS2K3 Presentations/Mon/loft.ppt" target="_new">http://www.scd.ucar.edu/dir/CAS2K3/CAS2K3 Presentations/Mon/loft.ppt</A>Look at the last slides. Too lazy and not enough time to do the math, but its probably correct that the Itanium has a smaller core than the P4 or Athlon on the same process node.
> (think BIOS but for the CPU instead of the mobo)
This is a pretty simple but accurate explication of what microcode is. But how exactly do you think this would help achieving:
>You could easily have one chip support two different front
>ends. In a way Itanium already does.
It doesnt really, its not related. Supporting two radically different instruction sets is anything but easy. I'm not saying it can't be done, and if anyone can do it, its probably intel, but there is no way in hell this going to happen with prescott, and I seriously doubt tejas will support IA64.
>It would not take much effort at all to turn the P4's OoOE
>hardware into speculative execution hardware,
*cough*
>especially
>when the P4's clock speed can be double that of an Itanium
And what makes you think this IA64-P4 hybird would also achieve double the clockspeed of an Itanium ? You think its trivial to have 128 registers and running the core at those speeds ? Designing a core to achieve such high clockspeeds is an incredibly challenging job, and requires an extremely good balance of all the components. You can't just slap on a VLIW decoder, increase your register count by a factor 16x, double their size, change just about every core component as well and expect your design to reach anywhere over 100 Mhz without a from the ground up redesign.
Itanium was designed from the ground up; if it was easy to support two different ISA's on one chip, both running full speed, Itanium would have been that chip. Even intel couldnt manage that with a chip that has almost limitless die estate and power envelope. There are even lots of rumours that the x86 decoder is holding back clock scaling of the Itanium, and will therefore by removed in future versions. If they couldn't do it with Itanium (btw, where Transmeta could, with a VLIW core running x86 at decent speeds), I really don't see how they will do it with prescott or tejas.
= The views stated herein are my personal views, and not necessarily the views of my wife. =
The Itanium has a relatively small pipeline and can manage 1.5GHZ with all these units working. (I'll have to check to see how many stages it has. I thought 7 stages was it, but that could've been for the IA64 decoding because of the compiler removing most stages.)
Why should a P4 with such a deep design (long pipeline of 20 stages) not be able to do even better?
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
Normally I'm not THAT good looking in Intel PDFs (goddamn weak search engine IMO), but I found it: 8 stages, Page 191 of this article <A HREF="http://ftp://download.intel.com/design/Itanium2/manuals/25111002.pdf" target="_new">http://ftp://download.intel.com/design/Itanium2/manuals/25111002.pdf</A>
Interesting still, it limits it even more to clock speed compared to Athlons, yet 1.5GHZ on 0.13m is doable.
EDIT: 10 stages for FPU. Although if we measured scalability by that, then the Athlon's 13-15 stages for FPU is even farther than the Itanium. (I don't remember the exact FP figures)
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
<P ID="edit"><FONT SIZE=-1><EM>Edited by Eden on 10/30/03 08:50 PM.</EM></FONT></P>
I don’t know why this bothers me but, I would be upset to find out that my processor was capable of 64 bit but turned off /hidden /disable. I would be like buying a DVD player with stereo capabilities but the company has decided to turn it off and hide it by default because there are no competitors. So you’re stuck with mono but you know it’s capable of stereo? Sure they’ll enable it when it needs it for the competition?
Only with Intel, I guess.
Well, think about HT, it was there all along, just never properly implemented. They needed the HT P4 versions to get it to work. Think also of how many own WinXP at P4 launch in 2000. Practically no one really. HT on Win2K is death of OS literally.
64-bit without no support is awkward, so to me it seems highly unlikely there are any extensions hidden in your CPU.
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
----------------------------------------------------------------------I'm thinking of both. My point is that the line between the hardware needed to them is very thin. It would not take much effort at all to turn the P4's OoOE hardware into speculative execution hardware, especially when the P4's clock speed can be double that of an Itanium, allowing it to split execution and cache the results using the OoOE tables in order to emulate 'true' speculative execution
-----------------------------------------------------------
As allready explain from others useur Itanium deal with branch with much more fitness as it do both branch unless a others branch will be hit in less that X cycle.In a P4 or any others CPU (exception EV8 ) a prediction is made before it can continue to excute on speculatif branch it must store initial value and IP .In a case of a wrong path initial value are restore and excution continue on the new branch.The point a itanium will use predicate computing , it can use some rotatif reg and excute the ``I dont think that the good branch but give it a try``to do that on P4 will need to double every register in GPR and FPR even if it does there a increase in performance as P4 wait much more cycle for it branch result that a itanium so therefor both branch will be excute.With Ht enable is if the second tread hit a branch they might be 2 tread and 2 nanotreads with the others branch.P4 use OoO so renaming register will have to be split this will end up as a 64 or 32 renaming logic per treads so only 64 instruction of all kind can be use in all the pipeligne if the 1 treads use allready 88 (random number) 2 tread 55 (random number) so a total of 88*2 + 55*2 of renaming logic will be use so tecnicaly a 1028 ranaming register of 64 bit will be use +tag and bla bla.Second point: itanium got short stage discarding the bad branch take 1 to 3 cycle + branch unit latency in a overall of 5 to 7 cycle on a p4 this may go to 10 to 20.P4 core is allredy big much bigger that Itanium even if there 2X as much resource units.There also specalation on data value again this is not good on P4.
-----------------------------------------------------------
You sure about that? I'd love to see your source of information. From what I can find on a pure 0.13 micron comparison basis the Itanium2 has a die size of 374mm2 while the P4 has a die size of 146mm2 and the TBredB has a die size of
Quick Math P4 EE 2 mb of L3 cache =160 million transistor
Itanium 2 1.5ghz 6mb +0.5 of extra block.
I dont like french test<P ID="edit"><FONT SIZE=-1><EM>Edited by juin on 10/31/03 04:27 PM.</EM></FONT></P>
Juin just one question: When you say "bolt", do you mean "both"?
If so, you should try to say the latter, as I've always thought you refered to a company called Bolt and it's not a very close word to "both".
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
| Quote : It would not take much effort at all to turn the P4's OoOE hardware into speculative execution hardware, |
Erm, the P4 does do speculative execution. That's what all modern MPU's do. IA-64 does it a different way. Instead of speculating, it just does both branches.
| Quote : It doesnt really, its not related. Supporting two radically different instruction sets is anything but easy. I'm not saying it can't be done, and if anyone can do it, its probably intel, but there is no way in hell this going to happen with prescott, and I seriously doubt tejas will support IA64. |
Itanium already supports 2 front-ends. In fact, it's 2 processors in one. The x86 processor just happens to be very weak. Now, if you shed the processor and just had x86 decoders on there.....used the IA-64 backend instead, that could be some serious processing power.
| Quote : And what makes you think this IA64-P4 hybird would also achieve double the clockspeed of an Itanium ? You think its trivial to have 128 registers and running the core at those speeds ? |
About as trivial as the 128 physical registers on the P4 running at 3.2 GHz is......You only see 8 in the ISA but there's 128 physical ones in there for register renaming, etc.
| Quote : Designing a core to achieve such high clockspeeds is an incredibly challenging job, and requires an extremely good balance of all the components. You can't just slap on a VLIW decoder, increase your register count by a factor 16x, double their size, change just about every core component as well and expect your design to reach anywhere over 100 Mhz without a from the ground up redesign. |
I don't think anyone suggested you could just "slap" it on. But pretty much all modern MPU achievements aren't just "slap-on" jobs. Banias should be proof enough that when there's a will, and billions of dollars in R&D, there's a way. Imagine, an x86 processor that's more power-efficient than most RISC processors out there.
| Quote : Itanium was designed from the ground up; if it was easy to support two different ISA's on one chip, both running full speed, Itanium would have been that chip. Even intel couldnt manage that with a chip that has almost limitless die estate and power envelope. |
Couldn't or wouldn't? What would be the advantage of having high-speed x86 processing on a high-end server-level RISC-competitor?
| Quote : There are even lots of rumours that the x86 decoder is holding back clock scaling of the Itanium, and will therefore by removed in future versions. If they couldn't do it with Itanium (btw, where Transmeta could, with a VLIW core running x86 at decent speeds), I really don't see how they will do it with prescott or tejas. |
Comparing the execution resources of the Crusoe vs Itanium 2 is a bit like comparing a Ford Pinto to a Ferrari.
"We are Microsoft, resistance is futile." - Bill Gates, 2015.
About as trivial as the 128 physical registers on the P4 running at 3.2 GHz is......You only see 8 in the ISA but there's 128 physical ones in there for register renaming, etc.
I dont get your point
I dont like french test
As long as FX32 can make itanium performe like a Xeon gallatin of the same speed having any kind of X86 emulation is point less.
I dont like french test
| Quote : As long as FX32 can make itanium performe like a Xeon gallatin of the same speed having any kind of X86 emulation is point less.
|
A 1.5GHZ system is more than adequate for the average work. It's decent enough to transition your software, no?
Also, going from P2 300 performance 32-bit to some 1.5GHZ equivalent is quite good, and if Intel improves on it, it can only get better as a boon for transitioning.
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
| Quote : About as trivial as the 128 physical registers on the P4 running at 3.2 GHz is......You only see 8 in the ISA but there's 128 physical ones in there for register renaming, etc. |
He's basically saying the P4 also has 128 registers even though only 8 appear to programmers, and that these run at such high speeds, basically negating BB's argument about how hard it is to clock high with this many registers.
Spud tells me you're not even using 50 registers so far in Itanium 2 due to limits. I think it was around that with Itanium but they had to lower the usable amount in the 2nd core to make it work better. I'll need to get more info on this one though.
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
that was my point if itanium = a P4 core of the same cache size and FSB a 2GHZ itanium would performe like a XEON 2ghz 400 fsb and 6MB of L3 cache that good performance for office use as the main apps and os will run native mode and X86 emulation is good for 99% od all apps on the market exception to game but game need bandwith not cpu power.
By clearing the IA-32 emulation wich sit right at a very need place between the cache and excution units/front-end.
I dont like french test
this reg are renaming logic not GPR so that why i ask the question there a difference
I dont like french test
(This reply is general and not to anyone in particular)
Now, as slvr said, it is not too hard really to emulate IA-64 on x86, but what's the point of that? Tell me. Compare the performance of a 3.2Ghz P4C, to that of a 1.5Ghz Madison, and Madison beats the P4 in every category, except Integer performance. Madison more than makes up for it with it's FP performance, which crushes everything, from K8's to P4's, all the way to Power 4's and Alphas. It just would not make sense to emulate IA-64 in a P4. Intel is rather doing the opposite; they are using the FX32! concept to emulate ( in software) X86 on the Itanium. The Alpha engineers are working at this over at Intel, and as many of you already know, have gotten the emulation to 1.5Ghz Gallatin Xeon speeds. They are improving this everyday. With this kind of performance, the concept of using x86 to emulate IA-64 is rather absurd. Intel, at most, would make a hybrid with x86 and IA-64 front-ends. This would be mostly an IA-64 chip, with x86 decoders.
The current x86 core, which is basically a slapped-on Pentium core, sits on the Itanium between the cache and the front-end, and Intel is eager to get rid of it. Even on the Itanium, hardware emulation of x86 is a real pain, and software emulation is just so much easier. Getting rid of it will further increase the performance of the Itanium.
Emulating in microcode simply requires too much work (is a very big hassle), and the performance of both IA-64 and x86 will suffer, as opposed to having simply 2 different front-ends.
As it stands, Intel is most likely to simply bolt on an x86 core to an IA-64 core, or just use the x86 decoders, and maybe the front-end, to make a hybrid, which would have full speed x86 processing power, as well as full speed IA-64 power.
- - -
"... In the semiconductor industry, it's good to be paranoid ..." - [Andy Grove]</font color=green>
I disagree. It's not absurd. It's in fact a boon to Intel who owns the IA64 architecture and could very well not need a new OS to run the IA64 P4.
As it stands, x86-64 needs programs for it just as IA64 does too. Exception that x86-64 allows x86.
But then again, if Windows 2003 allows x86, then Intel has a major lead already.
If Slvr is right about the core being very close to EPIC properties, then it should not be too hard to run things on it. Granted it will have to deal with weaker register amounts, and yes performance will suffer, but considering it will use a powerful architecture and its clock speed is more than twice the Itanium's speed (and can anytime compensate if the clock speed was lowered, with its longer pipeline advantage), IMO I would say that Intel would be best off with IA64 on P4, if they really are stubborn and do not want AMD64 on their core, which is basically royalty-free.
Basically we will have to wait and see how stubborn is Intel.
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
He's just pointing out that the P4 CAN work with that much registers, whether renaming or GP registers, at such high clock speeds. (at least that's what I think)
Whether they are viewed by the programmer shouldn't make a difference nor if they are visible, I mean, the P4 still works with them at full speed, no?
--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A>
I disagree. It's not absurd. It's in fact a boon to Intel who owns the IA64 architecture and could very well not need a new OS to run the IA64 P4
It will still need a new os and driver
-------------------------------------------------------------If Slvr is right about the core being very close to EPIC properties, then it should not be too hard to run things on it. Granted it will have to deal with weaker register amounts, and yes performance will suffer, but considering it will use a powerful architecture and its clock speed is more than twice the Itanium's speed (and can anytime compensate if the clock speed was lowered, with its longer pipeline advantage), IMO I would say that Intel would be best off with IA64 on P4, if they really are stubborn and do not want AMD64 on their core, which is basically royalty-free.--------------------------------------
I dont see where they are similar.Itanium 2 look much closer to a banias that a P4.Clock speed wise Power 4 got about the same stage lenght and same micron process and still it much more slower in clock speed.Dont compare a server cpu vs a desktop cpu.
I dont like french test
| Quote : I dont see where they are similar.Itanium 2 look much closer to a banias that a P4. |
I have no idea where you see the similarity. Banias is about efficiency. It has limited parallel execution resources, but it utilizes a lot of them. Itanium 2 has *massive* parallel execution resources. 2 parallel FMAC units for crying out loud. A 6-way instruction decoder, dedicated branch units, etc.
| Quote : Clock speed wise Power 4 got about the same stage lenght and same micron process and still it much more slower in clock speed. |
The Power 4 has a 16-stage integer pipeline, the P4, 20. The Power 4 has a much more strict guidelines in terms of reliable data output. It also doesn't have the advantage of mass-volume yields, constant tweaking/steppings, etc. that the P4 has. And even so, it's at 1.7 GHz. The PPC970, with the same design, is set to reach 3 GHz.
"We are Microsoft, resistance is futile." - Bill Gates, 2015.
The Power 4 has a 16-stage integer pipeline, the P4, 20. The Power 4 has a much more strict guidelines in terms of reliable data output. It also doesn't have the advantage of mass-volume yields, constant tweaking/steppings, etc. that the P4 has. And even so, it's at 1.7 GHz. The PPC970, with the same design, is set to reach 3 GHz.
Like i say dont compare serve cpu with desktop.Itanium 2 and power4 will be over 2 ghz if that was not the power cosumation issue and RSA.If you want compare power 4 vs opteron.More stage on Power and lesser clock speed.PPC 970 will reach 3ghz on 90nm not on 130nm
I dont like french test
have no idea where you see the similarity. Banias is about efficiency. It has limited parallel execution resources, but it utilizes a lot of them. Itanium 2 has *massive* parallel execution resources. 2 parallel FMAC units for crying out loud. A 6-way instruction decoder, dedicated branch units, etc
The philosophie is the same behind.Itanium was made for effenciency at higher level.P4 is made to mix ILP and TLP and it procsessing power come from it clock speed.
I dont like french test
| Quote : The philosophie is the same behind.Itanium was made for effenciency at higher level.P4 is made to mix ILP and TLP and it procsessing power come from it clock speed. |
Not really. The decoding scheme in and of itself sacrifices quite a bit of efficiency for maximum performance. The decoding scheme quite often requires the use of no-ops because not all of the templates meet the program's requirement. However, the use of templates facilitates decoding and allows the 6-way decoder to function the way it does. The FMAC units are also another example. It only works at peak when you have a multiply-accumulate instruction. Which isn't all the time. It sacrifices efficiency when only multiply or addition is done in favor of higher performance in cases in which multiply-accumulate is used.
"We are Microsoft, resistance is futile." - Bill Gates, 2015.
template in I2 meet 95% of the need unlike I1 they support much more possiblity also there way to overcome that with all compiler and little thinking.FMAC will be use in almost all cpu as they are faster yes there a probleme if they come to use of division as there is no divison instruction or root square.In the case of ALU they support all instruction and able to run all at once.
Disparity and decoding are not the same there is no decoding in I2 only bundle disparity.Yes we can say that a 123 bit instruction in decode in 3 41 bit instruction but this fix and therefore that not a decoder.
I dont like french test
IA-64 instructions are still "decoded" in the traditional sense. Internal micro-ops would considerably bloat code (more so than IA-64 already) and wouldn't be suitable for direct-processing. IA-64 opcode, though less complex than x86, still isn't as execution "friendly" as Intel's proprietary micro-ops.
And I'd like to see where you got this "95%" number for the Itanium 2. As far as I'm aware, the templates are built into the ISA. It isn't implementation-specific. So whatever limitations the templates have on Itanium 1, would remain the same on Itanium 2.
As for FMAC, FMAC only works best when you do multiply-accumulate. Now, while this does happen alot in FP code, it doesn't always. And in such cases, efficiency is sacrificed.
To get back on track, Itanium 2 and Banias have very different design concepts. Itanium, like the P4, was meant to address performance first, even at the expense of granuarity and, therefore, efficiency.
"We are Microsoft, resistance is futile." - Bill Gates, 2015.
There are 1194 identified and unidentified users. To see the list of identified users, Click here.
You are about to answer a thread that has been inactive for more than 6 months.
If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.

