Am I getting confused or does the A64 beat the P4C

With only half the memory bandwidth available?

<P ID="edit"><FONT SIZE=-1><EM>Edited by RAIN_KING_UK on 09/25/03 09:57 PM.</EM></FONT></P>
42 answers Last reply
More about confused beat
  1. AMD's got hyper transport~~~~
    and just because the Intel FSB is quad pumped doesn't mean the performance will increase four time.

    Simply said

    the A64 wastes the P4C and beats P4EEs


    RIP Block Heater....HELLO P4~~~~~
    120% nVidia Fanboy
    PROUD OWNER OF THE GEFORCE FX 5950ULTRA <-- I wish this was me
    waiting for aBox~~~~~~~~~~~~~~~~
  2. But I'm thinking the A64 non-FX platform only has single channel hypertransport which only provides 3.2GB/s vs the 6.4GB/s available to the quad-pumped P4C. Am I confused?

    <P ID="edit"><FONT SIZE=-1><EM>Edited by RAIN_KING_UK on 09/25/03 09:59 PM.</EM></FONT></P>
  3. A A64 Hypertransport link is dual 8bit either way or 16bit unidirectional DDR and runs at 800mhz, which works out to 6.4GB/s one way or 3.2GB/s both ways. This is all separate to the 3.2GB/s single channel DDR400 memory interface. The XBar memory controller seems to alleviate some of the bottlenecks between AGP and main memory.

    Dichromatic for your viewing plesure...
  4. ya.... whatever he said ;)

    RIP Block Heater....HELLO P4~~~~~
    120% nVidia Fanboy
    PROUD OWNER OF THE GEFORCE FX 5950ULTRA <-- I wish this was me
    waiting for aBox~~~~~~~~~~~~~~~~
  5. I remember a few months ago people were saying if the A64 is single channel it would never compete with the P4C's - that isn't the case it seems. I am just trying to work out why, because it seemed to make sense at the time.
  6. Why would you attribute memory bandwidth on the K8 directly to the P7 core's method of functioning?

    The P4 uses different and longer fetch lines, which requires more bandwidth each time. K8 doesn't, however the use of the ODMC makes it more sensitive to bandwidth and actually more bandwidth helps. (no FSB)

    Just like IPC, you should not compare the bandwidths together to form a conclusion on expected performance. Remember, the Itanium runs off PC1600 and yet it rapes the competition, no? (or has it upped the bandwidth?)

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  7. I thought the Itanium systems have 8.6 GB/s of memory bandwidth - not sure though.

    This whole thing is kind of confusing me, in case you hadn't noticed. ;)
  8. WHat is the exact function of FSB? YEs, I am a Newbie, thanks for pointig that out Grub.
    Also, with the P4 having quad pump, how does it work? Ie in term of how software could advantage?
    So far, with all these number, I ain't sure exactly what they really mean anymore. Eg, with XP3200+, the have FSB400, does it really mean that the speed from CPU to memory is at 400MHz?

    System Integration...yeah right, thanks to marketing, more confusion
  9. Re: with XP3200+, the have FSB400, does it really mean that the speed from CPU to memory is at 400MHz?
    Yup I believe the 3200 is the only amd cpu to support 200 ddr in sync. But the mother board and ram must allow this as well. I might be wrong but as I recall it, only the 3200 that supports a 200fsb.
    and about grub its not personal he says that to all stranger newwbies.
  10. let's all get itaniums :D

    RIP Block Heater....HELLO P4~~~~~
    120% nVidia Fanboy
    PROUD OWNER OF THE GEFORCE FX 5950ULTRA <-- I wish this was me
    waiting for aBox~~~~~~~~~~~~~~~~
  11. Yaya I can get a ULV I2 for 876CAN. Special order board for 573CAN from Dell and 4gig of PC2100 for 1677CAN. Then some Windows XP 64 for another 278CAN.

    -Jeremy

    :evil: <A HREF="http://service.futuremark.com/compare?2k1=7013108" target="_new">Busting Sh@t Up!!!</A> :evil:
    :evil: <A HREF="http://service.futuremark.com/compare?2k3=1311896" target="_new">Busting More Sh@t Up!!!</A> :evil:
  12. What do you want to do with one of those? lol - just kidding - I'm most certainly waiting till Jan/Feb to see exactly what happens. It's way to early to come up with any definate answer as there are not even decent fair tests out yet.
  13. Grub's a funny bud, he's just grouchy lol, and no he doesn't mean to be mean ehhe.

    Well, I don't have the full notion on buses, but what I can tell you, as simple as I could find and understand myself:
    Quad-pumped means it takes four data bits per clock. That means each wave of electricity, instead of one pulse, it's like 4 in one. DDR would take data on the rising and falling edge of a clock tick, while the P4 extends even further to something twisted that I never really grasped.

    Now, to what a bus is. It's as simple as a real bus, it takes people somewhere. The bus is the path on the CPU to the other components. It takes data to and fro.
    The bus has a certain bitwidth. Think of how many people can ride on one bus. Well, the bitwidth is how big the data can be inside it, or how much. P4s have a 64-bit bus and with the Quad-pumping, you can say 256-bit effective. Thus, if we wanted to calculate in bytes: 256-bit/8 => 32 bytes. 32 per clock times 200 million, equals roughly 6400000000 bytes or around 6.4GB/sec.

    Now, many have the notion of more bandwidth=superior performance. And some would then accuse the P4 of needing twice the bandwidth in the past, to kill an Athlon if not compete. That's false. The P4 has a deep pipeline in it, and whenever it fetches data, it fetches it in big increments, not to mention because it risks making errors in its calculations and needs to flush the pipeline down, making it having to refetch the data again.
    Simply put: More bandwidth supplied helps as a backup method to keep the processor working even if it failed one calculation, helps it stream more data and helps it fed.
    The Athlon didn't need that and still isn't as hungry as the P4, because it has a short pipeline and does not do many errors and flushes as much. Plus it does not fetch big lines of data. I'd like it if imgod2u could come and explain this further, as I have lost the notion of it and how to explain it concretely.

    The XP3200+ does have an EFFECTIVE 400MHZ clock and bandwidth speed, but it runs at 200MHZ actually, and since it fetches two times per clock, you can almost say it runs at twice the speed. Don't be fooled though!
    P4s DON'T HAVE AN 800<b>MHZ</b> FSB. They have a 200MHZ FSB QDR. Or better yet, use MT when saying 800MT. It works that way than 800MHZ. MT meaning MegaTransfers, or Millions of transfers.

    Hopes this helps.

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  14. Honestly Itanium is the way to go. If Intel can cast an almost perfect emulation of x86-32, or IA32, they can mainstream the Itanium. Yes clock speed would be devastatingly low. But in time they will develop Itaniums will long pipelines. The difference is that it uses IA64, an architecture far better than x86. At least when Itanium works on FPU, all of it is used practically all the time. Ace's Hardware shows the Itanium getting 5 times the performance in FPU than the Opteron, at only 1GHZ!

    If we can get Itanium mainstream, and it supports x86 almost perfectly, and establishes IA64 as the base future, we could seriously have a new world of PERFORMANCE computing.

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  15. Quote:
    I thought the Itanium systems have 8.6 GB/s of memory bandwidth - not sure though.

    Itaniums use a DDR 200Mhz FSB, a 400MT FSB. However, Itanium uses a 128-bit wide FSB, giving it a total of 6.4GB/s total transfer rate. As for memory, they typically use a four-channel DDR system, giving them a total of... 6.4GB/s (4xPC1600, I think...), so they run synchronously... Very interesting architecture...

    :evil: <font color=red><b>M</b></font color=red>ephistopheles
  16. Quote:
    If we can get Itanium mainstream, and it supports x86 almost perfectly, and establishes IA64 as the base future, we could seriously have a new world of PERFORMANCE computing.

    Actually, the Itaniums can emulate x86 perfectly. However, the only performance gains to be had, in the desktop market, are for number crunchers.
    Quote:
    At least when Itanium works on FPU, all of it is used practically all the time.

    That's right, the Itanium's core design is geared heavily towards floating point calculations. That's why it is widely considered the sovereign of floating point.

    However, relatively few other components exist. Lack of multimedia extensions (SSE, SSE2, 3DNow) make this processor nearly useless for gaming. Your typical office applications would run at average speed (maybe). Server apps might reap the benifits, but only if you are running the right software (IA64 based code). While x86 based programs can be emulated perfectly on Itaniums, the emulation is extremely slow. Sorry Eden. It was a good thought.

    Quote:
    Yaya I can get a ULV I2 for 876CAN. Special order board for 573CAN from Dell and 4gig of PC2100 for 1677CAN. Then some Windows XP 64 for another 278CAN.

    Also a great idea, spud, but I beleive WinXP-64 is made for the x86-64 architecture. While Itaniums can emulate x86, I don't think they are yet capable of emulating x86-64. You'll just have to wait a while assuming Intel decides it is worth emulating.

    Pain is the realization of your own weakness.
  17. Quote:
    However, relatively few other components exist. Lack of multimedia extensions (SSE, SSE2, 3DNow) make this processor nearly useless for gaming.


    First and foremost, not many games take excessive use of SIMD extensions now anyway. Secondly, SSE/SSE2/3DNow! are x86 extensions. They're there to augment the functionality of the ISA because of how poorly it handles instruction-level parallelism. IA-64 has no need for such extensions as its base ISA takes care of all of these functions. VLIW could be seen as MIMD, multiple instructions, multiple data.

    The current Itanium implementation has plenty of power asside from FP. It's branch units are among the best in MPU designs today which utilizes an interesting method of handling branches. Executing both branches and discarding the result that was incorrect.
    Its ALU is also comparable to most MPU's out there. The difference is, the processor's at a relatively low clockspeed. Arithmetic instructions typically offer much more ILP in superscalar ISA's than FP code does (i.e. x86 MPU's are able to get more IPC when running integer code) and hence, the Itanium doesn't shine in this region due to its lower clockspeed. It's not so much that it doesn't do well in any other area except FP, it's just that FP happens to be a weak point of a lot of MPU's out there and Itanium looks godly compared to them.

    Quote:
    Your typical office applications would run at average speed (maybe). Server apps might reap the benifits, but only if you are running the right software (IA64 based code). While x86 based programs can be emulated perfectly on Itaniums, the emulation is extremely slow. Sorry Eden. It was a good thought.


    It depends. Emulation depends on the emulator. Currently, Intel has the developement team they acquired from Alpha, the same guys who made the FX32! (which achieved around 80-90% native runtime speed running x86 code in emulation on Alpha) working on it.
    The initial generation of emulators supposedly match a 1.5 GHz Itanium 2 to a 1.5 GHz Xeon. It isn't spectacular but it's a start.

    "We are Microsoft, resistance is futile." - Bill Gates, 2015.
  18. Just so you know, I was very well aware on Itanium's not-so-perfect emulation. No need to be sorry!

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  19. Yeah, an itanium tweaked out for equal or less 64 and 32 bit perf would be nice. It's tough though taking into consideration the hideous power requirements and heat output by the beast, not taking into consideration the low scalability and currently weak emulation for 32 bit software. Still, there's always the chance in new revisions of the IT2, and if they ever decide to push harder for software development, maybe the Itanium 3?

    :cool: I run my AthlonXfx at 7.65 Exahertz :cool:
  20. Quote:
    However, relatively few other components exist. Lack of multimedia extensions (SSE, SSE2, 3DNow) make this processor nearly useless for gaming.

    Dude the Itanium already SSE built in as does the I2 128bit registers are still perfectly fine for a 64it processor.

    Quote:
    Also a great idea, spud, but I beleive WinXP-64 is made for the x86-64 architecture.

    Actually the beta for WinXP64 was initially IA64 also Windows 2003 supports IA64 as well.

    -Jeremy

    :evil: <A HREF="http://service.futuremark.com/compare?2k1=7013108" target="_new">Busting Sh@t Up!!!</A> :evil:
    :evil: <A HREF="http://service.futuremark.com/compare?2k3=1311896" target="_new">Busting More Sh@t Up!!!</A> :evil:
  21. Windows XP 64-bit for Itanium was out about 1 week after the release of Windows XP Pro and Home as I recall....

    "We are Microsoft, resistance is futile." - Bill Gates, 2015.
  22. cool man, that was the best explanation of fsb i ever heard, now i understand it better.
  23. Yah, I liked that explanation also. I think it was one of the best i've heard too.
  24. Why doesn't Amd have quad-buses. Are they hard to make? I realy don't understand what is so special about it. Does Amd not know how? Or or they patented?
  25. probably because they dont have the resorces.

    It doesnt matter anymore, now amd's fsb is synchronous to the cpus clock speed, and that yields more bandiwith even in SDR than intels QDR bus.


    If it isn't a P6 then it isn't a procesor
    110% BX fanboy
  26. Hey guys, thanks for the comments!

    Still I wish imgod2u would write back and explain the fetch thing explaining the extra bandwidth required on a similar clocked P4, even at the same clock speed of the Athlon.

    As for the QDR thing. I dunno personally, however, AMD manages now the high memory bandwidth and FSB because, well, they got no more FSB, so it runs pretty much at what the memory wants. Intel's FSB is still the traditional one, and since it is 64-bit wide, and pure 800MHZ buses would cause MAJOR electrical problems (hey, the P3 had big trouble going to 166MHZ SDR, something about the GTL bus being weak compared to the EV6 who can stack to 200MHZ as we've seen with Barton), it's not enough to reach 6.4GB/sec. 64-bit-> 8 bytes. 8 bytes times 200MHZ equals only 1.6GB/sec. In comes QDR, times 4= 6.4

    So really, neither AMD nor Intel needs any better bus technology right now. I think they should start addressing latency and efficiency, as each CPU is FAR from utilizing all theoretical 6.4GBs!

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  27. Well the integrated memory controller is an effective way at reducing latency.

    I know though, what about the RAM, what can be done with that? - do you have insight on this... I'm not sure what is QDR, or what benefits DDR2 bring.
  28. RAM at the moment, IMO needs no improvement. DDR400 is supplying both the P4 and K8s very well. DDR2 has become something I could not care less about. It's been delayed for god knows how long, and all for what, extra clock speeds and less heat?

    QDR is Quad Data Rate. I don't think QDR will be DDR2 anymore. It was supposed to be an individual technology. I have little info on it, but JEDEC, the consortium reigning over RAM standards might know more. In general though, the future should lie with DDR2 or, if the costs allow it, quad-channel motherboards, which is what SiS is doing now with the QC RDRAM SiS 659. Lord knows if Rambus still has a place in the enthusiast market! :eek:

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  29. Again, the problem is latency. While pre-fetching helps a lot, it cannot always work. Magnetic RAM, I think, will be the next big thing. It'll effectively be able to cut latency down by leapfolds and allow the use of a low-latency, high-frequency bus like the ElasticIO bus used on the PPC 970.

    "We are Microsoft, resistance is futile." - Bill Gates, 2015.
  30. Could you explain what the deal with the P4 needing more bandwidth per clock as another CPU is?

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  31. Branches.

    -Jeremy

    :evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil:
  32. Not really, it has more to do with the caching method. The P4 uses 128-byte cachelines. These are separated into 64-byte strides (blocks of memory) but in each fetch/prefetch command, memory will transfer 128 bytes of information into cache *whether all 128 bytes are needed or not*.
    If the application you're working with has high instruction/data locality (e.g. you needed 2 bytes of that 128 byte block, but the other data you will need in the future is also part of that whole 128 byte block you transfered), then you've effectively transfered a large data set before it is needed avoiding the latency of having to go to memory.

    On the other hand, in applications with low data locality, you would've transfered 128 bytes of data, but you only needed, say, 2 bytes of that. You've just wasted 126 bytes worth of memory bandwidth fetching useless data.

    Compare this with the Athlon's 64-byte cacheline size or the P3's 32-byte cacheline size and you'll see that both waste much less memory bandwidth when dealing with applications with low data locality, however, neither would be able to effectively benefit from higher memory bandwidth in applications with high-data locality (making them much more latency-dependent as they'll have lower cache hit rates).

    It's a trade-off. It sacrifices efficiency in some cases while benefitting performance in others. So while the P4 (on average, including both high-locality and low-locality data sets), may waste more memory bandwidth than the Athlon, in cases of high data-locality, it's able to effectively trade memory bandwidth to mask latency with better results (more cache hits) than the Athlon or P3 can. It requires a higher memory subsystem but it can also take advantage of that subsystem to boost performance.

    "We are Microsoft, resistance is futile." - Bill Gates, 2015.
  33. True dat.

    -Jeremy

    :evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil:
  34. >If we can get Itanium mainstream, and it supports x86
    >almost perfectly, and establishes IA64 as the base future,
    >we could seriously have a new world of PERFORMANCE
    >computing.

    Cetrainly not the current iterations. It gets beaten badly by any other x86 core (P4 or hammer) on INT (check spec.org), in spite of the Itanium being 2-3x as large, have HUGE 6 MB caches, a 128 bit memory bandwith and >100W TDP. IA64 is great for FP and scaling beyond 8-16 way, but not much else. Definately not usefull on the desktop for now.

    = The views stated herein are my personal views, and not necessarily the views of my wife. =
  35. Quote:
    Cetrainly not the current iterations. It gets beaten badly by any other x86 core (P4 or hammer) on INT (check spec.org), in spite of the Itanium being 2-3x as large, have HUGE 6 MB caches, a 128 bit memory bandwith and >100W TDP. IA64 is great for FP and scaling beyond 8-16 way, but not much else. Definately not usefull on the desktop for now.


    I thought we went over this already. There are more versions of Itanium than the 1.5 GHz 6 MB L3 cache Madison core. The 1.4 GHz 1.5MB L3 cache Deerfield core, for instance, performs very well, is *smaller* than the AthlonFX and costs around $1100 and that's *without* mass marketing to drive down prices. Not to mention a TDP of 91W compared to 89W of the 2.0 GHz AthlonFX.

    Your argument is a strawman. One may as well say that because the Xeon MP's are huge, cost tons of money but gets beaten in SpecInt (as I recall, MP's only go up to 2.8 GHz or so) by the PPC 970, that all PPC is superior to all x86.

    "We are Microsoft, resistance is futile." - Bill Gates, 2015.
  36. I should make it very clear to everyone I have been useing the Itanium IA64 as superior too losely. I will refraise EPIC is far supperior to anything out there. IA64 is basically open sorce at this point and isnt the strong poing of the arcitecture.

    So again to be clear EPIC is superior to anything x86 processors are doing. Next week ill have my Itanium/K8 technologies compareision up which will put the nail in this discussion.

    -Jeremy

    :evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil:
  37. I could've sworn in Integer the Itanium 1.5GHZ blew the competition away as well?!

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  38. It does, just needs the code to be run threw the compiler for best performance.

    -Jeremy

    :evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil:
  39. Quote:
    I could've sworn in Integer the Itanium 1.5GHZ blew the competition away as well?!


    No, while the 1.5 GHz Itanium 2 does hold the top score at SpecInt, it's only barely winning. I would expect the 1.4 GHz Deerfield core to fall a bit behind the 3.2 P4.

    "We are Microsoft, resistance is futile." - Bill Gates, 2015.
  40. That still is stellar for its clock speed, but from a final performance POV, it doesn't hold the candle.

    They CAN refine it anytime, and as Spud told me and from what I read on an Itanium PDF, 128 registers can go a long way if you consider 8 are far from enough on x86, and 16 have proven to add quite some performance.

    Itanium needs optimizing to run even faster. I hope that happens. I'm all for its architecture to be implemented with IA32 for a transition.

    --
    <A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
  41. They are working on it. EPIC compilers are very very hard to make according to Intel. The compiler guys have been working with x86 CISC machines for soo long its like learning how to read all over again.

    -Jeremy
    Unofficial Intel PR Spokesman.

    :evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil:
  42. VLIW has only existed in concept for a decade or so and compilers to generate desktop-level applications for it have an even shorter lifetime. Superscalar compilers, on the other hand, has existed for 30+ years. Given time, and a more hyperpipelining, I'd expect IA-64 based products to be much better than they are today.

    "We are Microsoft, resistance is futile." - Bill Gates, 2015.
Ask a new question

Read More

CPUs King Bandwidth Font Memory