AMD quad-core Barcelona laid bare

Quote:
Other than four cores, the most obvious difference is the new widened SSE instructions. On the pre-Barcelona parts, SSE was done in 64 bit chunks, so if you wanted to do a 128b operation, you needed two passes, possibly more. With the widening of SSE, it should immediately double throughput on SSE instructions. Obviously media operations will benefit, but HPC and FP heavy ops will get a solid kick in the pants too.


http://www.theinquirer.net/default.aspx?article=35011

Nice this is a pretty good article....now if they could just hurry their asses up and let us see some f@#$'in benchmarks!!
16 answers Last reply
More about quad core barcelona laid bare
  1. Can you explain why you've linked to an article from Wednesday 11 October 2006??? A bit dated, wouldn't you say? :lol:
  2. WOW! You must feel like Amerigo Vespuci.... :roll:
  3. ^^ HAHAHA that was funny. :D
  4. This one is more recent, with some performance numbers.

    AMD Barcelona SSE128 details unfurled
  5. news from the inquierer = .


    ^^^^grain of salt
  6. Actually, there is a link in plane sight to a 37 page AMD developer pdf, which he got his info from.
  7. touche... :x


    well I guess they can be right every once in a while
  8. Quote:
    Actually, there is a link in plane sight to a 37 page AMD developer pdf, which he got his info from.

    10x for the info :) It was informative. Now we know that K10 FP pipeline is 1 stage longer than the FP pipeline of K8. (see page 24)
  9. Quote:
    Actually, there is a link in plane sight to a 37 page AMD developer pdf, which he got his info from.

    10x for the info :) It was informative. Now we know that K10 FP pipeline is 1 stage longer than the FP pipeline of K8. (see page 24)
    It's effect should be negligible. But the 64bit/cycle store bandwidth (or 128bit/2 cycles) might prove more influential as discussed in the original aceshardware thread. It would affect even simple SSE copy loops (think of Sandra's cache bandwidth measurements).
  10. Which means AMD keeps the FP crown. Nothing much that can be considered new to anyone then.
  11. Quote:
    Which means AMD keeps the FP crown. Nothing much that can be considered new to anyone then.
    It's a limitation, not an advantage. SSE based copy loops will run twice as fast on C2D.
  12. English please! Does this mean the Barcelona will be a self-learning cybernetic organism I could use to whack Conroe?
  13. no, but it'll wack your budget
  14. Quote:
    no, but it'll wack your budget
    :lol:
  15. Quote:
    Actually, there is a link in plane sight to a 37 page AMD developer pdf, which he got his info from.

    10x for the info :) It was informative. Now we know that K10 FP pipeline is 1 stage longer than the FP pipeline of K8. (see page 24)
    It's effect should be negligible. But the 64bit/cycle store bandwidth (or 128bit/2 cycles) might prove more influential as discussed in the original aceshardware thread. It would affect even simple SSE copy loops (think of Sandra's cache bandwidth measurements).


    I think you read that wrong. It says the data is transferred in 128bit blocks and decoded to 2 64 bit chnks which can both be written at the same time as K10 has two store ports (page 25).

    The same page says it will be better to use SSE copy loops because of the aditional bandwidth.


  16. PAGE 24:

    Quote:
    SSE128 adds an additional register read pipe stage
    - Only impacts floating point pipeline
    - Adds a cycle to FP load latency


    The FP pipeline is longer for 1 more stage, 18 stages compared to 17 of K8. Which part you don't understand?
Ask a new question

Read More

CPUs Quad Core AMD