Sign in with
Sign up | Sign in
Your question

Spuds Intel Propaganda Campaign *updated*

Last response: in CPUs
Share
September 9, 2003 11:14:23 AM

Reserved to when I get back into the loop.
September 9, 2003 12:59:08 PM

Wow... long post... :eek: 

I'll come back when I have more time and read it all...

:evil:  <font color=red><b>M</b></font color=red>ephistopheles
September 9, 2003 1:33:00 PM

just give a link to the site you cut and copied from already...

If he doesn't die, he'll get help!!!
Related resources
September 9, 2003 1:37:10 PM

sooooo long post.
i prefered diagrams

<font color=green>If your nose <b>RUNS</b>, and feet <b>SMELLS</b>.
Then you must be born <b>UP-SIDE DOWN</b>.</font color=green>
September 9, 2003 3:54:17 PM

Long, but yeah... Intel has been doing some cool stuff with the P4. Imagine if you could take some engineers from both companies and make a completely new product.

<pre><A HREF="http://ars.userfriendly.org/cartoons/?id=20030905" target="_new"><font color=black>People don't understand how hard being a dark god can be. - Hastur</font color=black></A></pre><p>
September 9, 2003 5:05:27 PM

its all from that boys head!! amazing, isn't it?

<font color=orange><b>
sex is bad, bad is sin, sin is forgiven so sex is in
September 9, 2003 5:25:58 PM

Great post spud, very informative.

_ _ _ _ _ _ _ _ _ _ _ _ _ _
P4 2.4C @ 3.0GHz 1.525V Stock HSF
Abit IS7 BIOS v1.3 GAT Auto
Corsair XMS 512MB TwinX3200C2 2-3-3-6
GeForce4 Ti4200 AGP8X 128MB
Seagate Barracuda 80GB SATA
September 9, 2003 7:55:26 PM

That would be nice. A new company building Cpu's.
September 9, 2003 10:52:41 PM

A few corrections:

1. The L1 data cache is used for only integer/memory address data. All FP data is fetched from the L2 cache. One of the reasons the P4 relies so much on its L2 cache.

2. The L1 cache is also pipelined. Meaning data can be accessed every cycle but it takes 2 cycles for a complete 32-bit integer to be loaded. This wasn't explicitly stated.

3. MMX had some instructions to operate on 64-bit quad operands. They were some logical and/or instructions, no arithmetic.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
September 10, 2003 2:29:01 AM

If anyone here needs to wake the hell up and read this, it's Kinney. Where's that boy now?

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 10, 2003 4:23:13 AM

Wow! This post is so long I don't think I am going to be able to abosrb it all tonight. I'll definitely give tis some thorough reading tommorew, and perhaps bookmark this thhread. BTW, Nice Job Spud.

My OS features preemptive multitasking, a fully interactive command line, & support for 640K of RAM!
September 10, 2003 4:34:06 AM

Hehe wait till you see my Itainium one man oh man dats the real keeper. Probably take me a day or two to proof. Do a better job that this one though ive noticed some more spelling grammar and sentence structure errors. Gotta get on those tomorrow this is a propaganda war.

-Jeremy

:evil:  <A HREF="http://service.futuremark.com/compare?2k1=6940439" target="_new">Busting Sh@t Up!!!</A> :evil: 
:evil:  <A HREF="http://service.futuremark.com/compare?2k3=1228088" target="_new">Busting More Sh@t Up!!!</A> :evil: 
September 10, 2003 4:45:57 AM

BORING :D 

kidding spud, great job, much better than my crappy buying guide ;) 

Proud Owner the Block Heater
120% nVidia Fanboy
PROUD OWNER OF THE GEFORCE FX 5900ULTRA <-- I wish this was me
I'd get a nVidia GeForce FX 5900Ultra... if THEY WOULD CHANGE THAT #()#@ HSF
September 10, 2003 7:40:35 AM

Spud, do you know what changes are being made to improve Hyper Threading or are the improvements going to come solely from the changes you already mentioned?

_ _ _ _ _ _ _ _ _ _ _ _ _ _
P4 2.4C @ 3.0GHz 1.525V Stock HSF
Abit IS7 BIOS v1.3 GAT Auto
Corsair XMS 512MB TwinX3200C2 2-3-3-6
GeForce4 Ti4200 AGP8X 128MB
Seagate Barracuda 80GB SATA
September 10, 2003 8:02:08 AM

Yeah, call it a "Pentathlon". That way it would run good in 5 different benchmarks! LOL!!!!

:tongue:

<font color=blue> Ok, so you have to put your "2 cents" in, but its value is only "A penny's worth". Who gets that extra penny? </font color=blue>
September 10, 2003 9:30:35 AM

although it's REALLY long and I havn't gotten all the way through.. I can say I was relieved by this fact and the post wasn't "prescotts going to kick some ass and be really cool because the opteron sucks and is made by amd"
articles like this are why I read these forums.. so I can actually learn things, if I wanted a pissing contest I'd walk down the street to a bar.. speaking of which I have to go now
September 10, 2003 2:00:56 PM

Big thing as far as I can tell from the available info from Intel and Jack Lo, Henry Emer and Dean Tullsen they are the formost IMO in the theory of Hyper Threading technologies.

With the issue port inscrease and the completely seperate L1 caches as well as the two rapid execution units to feed those caches there shoud be substancial improvements. Occording to what the whitepapers studies and thesis papers say that substantial increases in hyper threading operations are expected from these improvements.

As far as I am concerned this is extremely exciteing technology. With talk that Nocona will be able to do 4 threads instead of two makes it even more exciteing. The Prescott will soar thats for damned sure. By how much over the Northwood is still yet to be seen but I easily see 20%+ accross the board. But thats only my opinion based on my own observations on how the P7 reacts to cache size increases, issue ports, FSB speeds, and register adjustments.

-Jeremy

:evil:  <A HREF="http://service.futuremark.com/compare?2k1=6940439" target="_new">Busting Sh@t Up!!!</A> :evil: 
:evil:  <A HREF="http://service.futuremark.com/compare?2k3=1228088" target="_new">Busting More Sh@t Up!!!</A> :evil:  <P ID="edit"><FONT SIZE=-1><EM>Edited by spud on 09/10/03 11:29 AM.</EM></FONT></P>
September 10, 2003 3:33:28 PM

Imgod Im sorry I missed that I dont program yet so that Optimization Manuals are pretty useless to me but I think ill sit down sometime this week and read all that. I could do a code execution comaparision of the P7 K8 cores for everyones reference. But first thing is pulling my Itainium and Intel Compiler posts up.

-Jeremy

:evil:  <A HREF="http://service.futuremark.com/compare?2k1=6940439" target="_new">Busting Sh@t Up!!!</A> :evil: 
:evil:  <A HREF="http://service.futuremark.com/compare?2k3=1228088" target="_new">Busting More Sh@t Up!!!</A> :evil: 
September 12, 2003 4:43:44 AM

Dude Highlight your changes. I'm not rereading the whole thing to find one frilling paragraphs worth of infomation.

Dichromatic for your viewing plesure...
September 12, 2003 6:47:39 AM

Thanks, lotta that was over my head, but that was a lot better then "Intel rocks, AMD sucks" or vice versa. As someone new here stuff like this helps to understand what really is going on.

Thanks again!

<font color=blue><b>Purchase object A, install object A, curse object A, repeat...</b></font color=blue>
September 13, 2003 7:26:19 AM

bump good info..
September 13, 2003 9:11:24 PM

Great info! Thanks for the informative post. I look forward to your post on Itanium. It would be nice to see something this deep on the A64 core. I know it hasn't changed a LOT from previous AMD cores, but still, I havn't seen anything this good for any AMD core. Maybe when you're done with Itanium :) 
September 14, 2003 12:22:03 AM

<A HREF="http://arstechnica.com/cpu/3q99/k7_theory/k7-one-1.html" target="_new">http://arstechnica.com/cpu/3q99/k7_theory/k7-one-1.html...;/A>
One of the best sites I could possibly recommend to learn in-depth what the cores are made out of. It's that site which also let me understand just how potentially awesome the P7 core can be and that it IS superior to the K7 core, simply that it never got far yet.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 14, 2003 5:53:24 AM

What's proofing?

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 14, 2003 6:06:38 AM

I finally read it, at like 1AM!

Great read, but you definitely need to read on THG's or Ars' intro to the core before going to this one, as it treats complexity so much and with little to make the beginner know what he is reading.

A few things I noticed that didn't work right based on my previous readings:

Quote:
With that in mind the increase to 12k uops from 8k uops, we should see some very impressive increase to the processors execution ability especially with Hyper Threading.

I beleive you mean from 12K uOps to 16K uOPS. I don't ever recall 8000 uOPS stored only.

Quote:
3. Larger Trace Cache 16 128kbs kuops up from 12 80kbskuops on the Northwood and Willamette cores.

Interestingly, where did you get the info on actual BYTE size of the Trace Cache? 128Kbits?!
That's odd.

Also, what exactly do the buffers store in there? You say like 126 instructions and such. Now, wouldn't that mean you could technically access them like cache? Or do they hold trace data or something like this, and not actual executable data? It would be odd, considering the alternative Trace Cache deals with a maximum of 3 uOPS per clock. I had heard it would become double pumped or increased to a full 6 uOPS per clock cycle. (in an ideal scenario that means fully utilizing the IPC)
I don't know if 4K more uOPS to store in the TC will be of any use. I believe the issue rests in how many can be issued per clock. 3 is good, but you have to consider just how much does the x86 decoder do in conjunction. The P4's goal is to get to 6 as often as possible.

Also, increasing pipeline depth does not translate into lost IPC. It translates into increase lost IPC like pipeline bubbles more frequent, but you do not actually have to reduce IPC in an ideal scenario. That's why I like the P4 design over the K8 right now, because it can get its IPC up to K8 levels and still ramp as fast. But that's require 200W to spare heh.

I gotta admit, I am surprised at the amount of Prescott updates there is. I envisaged extra core enhancements, but never knew there were such minuscule ones. Nice to know.

Quote:
. It’s Tag RAM if anyone cares.

From what I learned about Cache, any cache system has TAG RAM. It's what keeps info on each data. I dunno what exactly is added in the Cache, but it should help HT.

Overall, I've said it too many times, and I still believe it, the P7 core is not only an extremely interesting core, but it's technically superior to the K7 in almost all aspects. I still get disagreements, but I believe the facts are here.
I just think it's too bad all of this functions at an average sub-standard IPC relatively to the K7, when you'd drool and expect these awesome components to make the P7 a true per-clock performer with awesome scalability. Perhaps it's because it needs coding for it. In any case, Intel's compiler and architecture is much better than whatever crap nVidia provides, to code and optimize for. At least the return result is SUBSTANTIAL and FAST TO CODE.
It's just sad in the end the package is a "clock-speed required" one, as, again, you still don't have the superiority being active other than in Multimedia times.

Anyways, great read Spuddy boy, and yet I should say there is tons more to be talked about if we wanted to go in-depth. But anyways, I'll make sure to link this to AMD fools who continue to think the K7 is superior.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 14, 2003 7:34:01 AM

Quote:
Also, what exactly do the buffers store in there? You say like 126 instructions and such. Now, wouldn't that mean you could technically access them like cache? Or do they hold trace data or something like this, and not actual executable data?


I believe he's talking about the re-order buffer which has to keep track of all the "instructions-in-flight" that are currently being executed in the MPU (since it's an OoOE processor).

Quote:
It would be odd, considering the alternative Trace Cache deals with a maximum of 3 uOPS per clock. I had heard it would become double pumped or increased to a full 6 uOPS per clock cycle. (in an ideal scenario that means fully utilizing the IPC)


The trace cache never issues more than 3 micro-ops per clock. If there is a stall in the execution engine, there are several buffers and the main issue scheduler which can hold the micro-op which has already been issued so it may be executed in the next clock iteration. To be firm, the P4 tries to achieve a maximum of 3 micro-ops per clock sustained, it's *peak* micro-op execution rate is 6.

Quote:
I don't know if 4K more uOPS to store in the TC will be of any use. I believe the issue rests in how many can be issued per clock.


Depends on the type of code. I've heard a lot of programmers complain that 12k micro-ops is too small to hold the critical loops in their programs.

Quote:
3 is good, but you have to consider just how much does the x86 decoder do in conjunction. The P4's goal is to get to 6 as often as possible.


Again, the maximum sustained rate is 3 micro-ops per clock. The x86 decoder does *not* function in parallel to the trace cache. Micro-ops are *always* issued from the trace cache or the micro-code ROM. If an instruction needs to be decoded by the x86 decoder it will decode first, be put into the trace cache or go through the micro-code ROM, and then be issued.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
September 14, 2003 1:43:11 PM

Quote:
In reply to:
--------------------------------------------------------------------------------

With that in mind the increase to 12k uops from 8k uops, we should see some very impressive increase to the processors execution ability especially with Hyper Threading.



--------------------------------------------------------------------------------

I beleive you mean from 12K uOps to 16K uOPS. I don't ever recall 8000 uOPS stored only.

Yuppers you caught that one, as I said I have lots of proofing (proof reading) to do. Thx for catching it ill correct it shortly.

Quote:
In reply to:
--------------------------------------------------------------------------------

3. Larger Trace Cache 16 128kbs kuops up from 12 80kbskuops on the Northwood and Willamette cores.



--------------------------------------------------------------------------------

Interestingly, where did you get the info on actual BYTE size of the Trace Cache? 128Kbits?!
That's odd.

Hehe I cant tell you that, its a trade secret dont cha know.

Quote:
I gotta admit, I am surprised at the amount of Prescott updates there is. I envisaged extra core enhancements, but never knew there were such minuscule ones. Nice to know.

Haveing the entire core reworked to improve clock distribution for better frequency scaling up to 4x better than that of the Northwood isnt what you call a significant improvement I dont know what is! Also L3 cache is going to make cache utilizeation go up, so cant say no to that.

Oh btw thanks imgod2u for clarifing before I could log on and do it myself. Glad someone knows a great deal about the core other than me. Oh Eden give it another read the core can only retire up to 3 uops. So it will never be able to exceed that unless they do some serious reworking but since x86 is soo clunky well never have to worry about that IA-64 on the other hand is a much different story.

-Jeremy



:evil:  <A HREF="http://service.futuremark.com/compare?2k1=6940439" target="_new">Busting Sh@t Up!!!</A> :evil: 
:evil:  <A HREF="http://service.futuremark.com/compare?2k3=1228088" target="_new">Busting More Sh@t Up!!!</A> :evil: 
September 14, 2003 8:12:57 PM

Quote:
Oh Eden give it another read the core can only retire up to 3 uops. So it will never be able to exceed that unless they do some serious reworking

Ok I'm lost here.
Trace Cache works to speed up pipeline filling and get execution done as often as possible with less need to go into the cache and decoder stages, right?

Now, from what I read and you said, the WRITE phase has a maximum of 3 UOPS repacked and ready to go back to the memory, no?
So, what are we correcting me on? I know very well the Trace Cache issues a maximum of 3uOPS per clock to the execution units, yet I am corrected. Someone "deconfuse" me!

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 15, 2003 1:50:40 AM

It can only retire 3 uops instructions at most per clock. Thats the way they designed the machine. IPC cant exceed that since it cant retire more than 3 of them.

Now your right it will execute as quickly as possible but its x86 some of the more elaberate intructions take several clocks to decode and execute anways. Now from what I read Eden you said the chip will try for 6uops to be executed per clock. Sure in theory itll execute them IN THEORY but it cant retire them all that clock.

-Jeremy

:evil:  <A HREF="http://service.futuremark.com/compare?2k1=6940439" target="_new">Busting Sh@t Up!!!</A> :evil: 
:evil:  <A HREF="http://service.futuremark.com/compare?2k3=1228088" target="_new">Busting More Sh@t Up!!!</A> :evil: 
September 15, 2003 2:41:25 AM

Yes ok, I see it can't, but better having more executed initially so the flow is constant, no?
Now I do wonder why it has to be this slow at retiring. It's a serious hamper. Will it increase later?

BTW, that does not mean it executes 3 uOPS max now does it? I fail to see the importance of a high IPC if so. Maybe I'm just getting the notion confused. All I want to know is if it can execute as much as 6 (the maximum amount of units) uOPS per clock, even if when it comes to retire afterwards, 3 can go at once.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 15, 2003 3:30:05 AM

Quote:
Yes ok, I see it can't, but better having more executed initially so the flow is constant, no?
Now I do wonder why it has to be this slow at retiring. It's a serious hamper. Will it increase later?


Even the best super-scalar processors out there (like the Power4+ or an Alpha) are 5-way issue and retire. There's only so much you can do in terms of gaining superscalar instruction-level parallelism. 3 micro-ops per clock, for the most part, will not be reached given average instruction-level parallelism.

Quote:
BTW, that does not mean it executes 3 uOPS max now does it? I fail to see the importance of a high IPC if so. Maybe I'm just getting the notion confused. All I want to know is if it can execute as much as 6 (the maximum amount of units) uOPS per clock, even if when it comes to retire afterwards, 3 can go at once.


The reason it's 6-way issue to the execution engine is mainly to avoid bottlenecks at execution. The maximum attainable IPC is 3. Of course, code-level parallelism, memory latencies, data dependencies, instruction latencies, etc. etc. usually doesn't even allow 1 IPC.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
September 16, 2003 1:05:57 AM

Quote:
The maximum attainable IPC is 3.

Well there's a new bugger. So you're saying at any given clock, NEVER will you have 6 units working. There's a bummer. I thought combined decoding and Trace fetching would result in over 3 uOPS in the pipe, going down to the units at the same time. Of course ideally.

Quote:
Of course, code-level parallelism, memory latencies, data dependencies, instruction latencies, etc. etc. usually doesn't even allow 1 IPC.

Ok now you seem to just say this. I'd like proof. It's like you're trying to round down so badly like each time you give an estimate, only this time it scratches the border of 0!

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 16, 2003 1:20:55 AM

Quote:
Ok now you seem to just say this. I'd like proof. It's like you're trying to round down so badly like each time you give an estimate, only this time it scratches the border of 0!

<A HREF="http://tesla.hpl.hp.com/caecw03/chen.pdf" target="_new">Holliman, Li, and Chen</A> over at HP did an analysis of mpeg2 and mpeg4 decoding on the Pentium 4's. The results show that the IPC rarely went over 1. Mpeg2/mpeg4 decoding is one of the P4's strengths and one it performs great on.

Quote:
Well there's a new bugger. So you're saying at any given clock, NEVER will you have 6 units working. There's a bummer. I thought combined decoding and Trace fetching would result in over 3 uOPS in the pipe, going down to the units at the same time. Of course ideally.

As I've explained before, if there is a halt during one clock due to data dependencies or other such, then the next clock, a peak burst of 6 micro-ops executing on all of the available issue ports is possible, however, your average throughout several consecutive clocks will still be only 3 micro-ops per clock, that is the maximum IPC (a statistical average).

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
September 16, 2003 1:24:59 AM

OK since you said a statistical average, it makes more sense. I was wondering for a sec what the goal was to design this many pipelines really.

I can understand if the average is 3. Then again your link says 1 or less. But, if so, then how does the P4 stand out so well if it barely executes enough per clock? Or is it simply because of SSE2's streamline nature? And if so, would that not mean SSE2 instructions have an address issued already in them rather than a seperate uOP to go to the AGUs?

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
Anonymous
a b à CPUs
September 16, 2003 1:27:43 AM

Captain Obvious Loves This Thread!

<b><font color=red>Captain Obvious To The Rescue!!!</font color=red></b>
September 16, 2003 1:31:16 AM

Upon giving this link a more thorough read, I do have to admit this is the first analysis that ever calculated statistics on core microarchitectural level. At least I can see the real working of each unit.

Interesting read there, but alas I barely grasp most of it.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 16, 2003 2:30:27 AM

Quote:
OK since you said a statistical average, it makes more sense. I was wondering for a sec what the goal was to design this many pipelines really.


Really to allow a smooth flow of instructions. If there's a block, the next cycle can make up for it. It's the same logic as having buffers on HD's.

Quote:
I can understand if the average is 3


It's theoretical maximum sustained throughput is 3. You can have bursts that are 6, but that's is only *one cycle*. At the end of the day, you will see 3 going in and 3 comming out if you look at the total work you did.

Quote:
Then again your link says 1 or less. But, if so, then how does the P4 stand out so well if it barely executes enough per clock?


Stand out so well compared to what? The K7? What makes you think the Athlon does more than 1 IPC on average in realistic situations? The K8 has the *exact same* theoretical throughput (3 IPC) as the K7, yet it's able to achieve a much higher realistic IPC. If you were under the impression that modern x86 MPU's were filled up to the brink, even at the decoder level, with instructions to execute then you're sorely mistaken.

Quote:
Or is it simply because of SSE2's streamline nature? And if so, would that not mean SSE2 instructions have an address issued already in them rather than a seperate uOP to go to the AGUs?


Complex x86 instructions that involve memory load/store are usually decoded into multiple internal micro-ops as I recall. Not all SSE/SSE2 instructions involve load/store.
Although you do raise a good point that SIMD instructions did account for a large portion of the code they were running and the maximum throughput is 1 FP vector instruction and 1 load/store instruction and 1 integer vector instruction per clock. Since mpeg2/mpeg4 decoding is mostly FP arithmetic, it'd make sense that the theoretical peak you could achieve is 2 IPC.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
September 16, 2003 7:13:34 AM

Quote:
Well there's a new bugger. So you're saying at any given clock, NEVER will you have 6 units working. There's a bummer. I thought combined decoding and Trace fetching would result in over 3 uOPS in the pipe, going down to the units at the same time. Of course ideally.

Eden that’s not exactly how it works the trace cache is able to store 6uops. There isn’t some little unit in there with uops floating about. All the trace cache really is a mid point storage system. It has ultra low latency and stores decoded x86 instructions. But since most x86 instructions fill the trace cache up with them it’s very important that Intel is increasing the size of the cache. Performance will be had I assure you.

BTW I do believe in general operations of say the K7-8 or the P7 average IPC is about .28 or so somewhere in that region.

-Jeremy

:evil:  <A HREF="http://service.futuremark.com/compare?2k1=6940439" target="_new">Busting Sh@t Up!!!</A> :evil: 
:evil:  <A HREF="http://service.futuremark.com/compare?2k3=1228088" target="_new">Busting More Sh@t Up!!!</A> :evil: 
September 16, 2003 4:05:22 PM

One could then draw the conclusion that x86 is absolutely horrible at parallelism, and that to ever reach significantly higher levels of computing, we would need something like compiler-level decoding as the Itanium would do, hmm?

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 16, 2003 6:54:51 PM

Quote:
One could then draw the conclusion that x86 is absolutely horrible at parallelism, and that to ever reach significantly higher levels of computing, we would need something like compiler-level decoding as the Itanium would do, hmm?


It's not really just x86, almost any scalar ISA is running into problems like these. Although yes, x86 seems to be especially bad at it.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
September 17, 2003 1:01:21 AM

Here.
I read part of it and will finish it later.
Nice reading material when I'm on the pot thats for sure.


Though what I've read so far has not changed my mind on 64bit computing.
My next CPU will be based on the best 64bit performance, 32bit performance and price... no one buys off specs.
Notice 64bit performance, I won't be considering one w/o it.
My athlon xp is doing good enough for me in 32bit, though I would like to order a 2500+ for HL2 and to tide me until the A64 goes into the newer socket form in about a year.

And when I say, no one buys off specs.. in that case I'd bought myself a GF-FX instead of these 9800s.
The majority was right about that, but I think the jurys still out on A64 vs. Prescott.
Though w/o 64bit I won't be tossing my money intels way this round.
I might keep my A64 around for a secondary CPU, and when Intel deems it 'time' for 64bit mainstream computing I'll probably have a better machine than if I go with a P4/P5.

Just my opinion of course at this time.
Its just logical IMO..

Athlon 1700+, Epox 8RDA (NForce2), Maxtor Diamondmax Plus 9 80GB 8MB cache, 2x256mb Crucial PC2100 in Dual DDR, Radeon 9800NP, Audigy, Z560s, MX500
September 17, 2003 1:47:02 AM

Quote:
Its just logical IMO..

No it's not..................IMO.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
September 18, 2003 1:06:44 AM

No one else here is producing such retarded replys like that..
thanks for elaborating.

/rolls eyes

Athlon 1700+, Epox 8RDA (NForce2), Maxtor Diamondmax Plus 9 80GB 8MB cache, 2x256mb Crucial PC2100 in Dual DDR, Radeon 9800NP, Audigy, Z560s, MX500
!