Sign in with
Sign up | Sign in
Your question

AMD quad-core Barcelona laid bare

Last response: in CPUs
Share
March 20, 2007 6:32:03 PM

Quote:
Other than four cores, the most obvious difference is the new widened SSE instructions. On the pre-Barcelona parts, SSE was done in 64 bit chunks, so if you wanted to do a 128b operation, you needed two passes, possibly more. With the widening of SSE, it should immediately double throughput on SSE instructions. Obviously media operations will benefit, but HPC and FP heavy ops will get a solid kick in the pants too.


http://www.theinquirer.net/default.aspx?article=35011

Nice this is a pretty good article....now if they could just hurry their asses up and let us see some f@#$'in benchmarks!!
March 20, 2007 6:45:20 PM

Can you explain why you've linked to an article from Wednesday 11 October 2006??? A bit dated, wouldn't you say? :lol: 
March 20, 2007 6:53:49 PM

WOW! You must feel like Amerigo Vespuci.... :roll:
Related resources
March 20, 2007 6:58:54 PM

^^ HAHAHA that was funny. :D 
March 20, 2007 8:43:02 PM

news from the inquierer = .



^^^^grain of salt
March 20, 2007 8:56:47 PM

touche... :x


well I guess they can be right every once in a while
March 20, 2007 9:28:28 PM

Quote:
Actually, there is a link in plane sight to a 37 page AMD developer pdf, which he got his info from.

10x for the info :)  It was informative. Now we know that K10 FP pipeline is 1 stage longer than the FP pipeline of K8. (see page 24)
March 21, 2007 10:23:37 AM

Quote:
Actually, there is a link in plane sight to a 37 page AMD developer pdf, which he got his info from.

10x for the info :)  It was informative. Now we know that K10 FP pipeline is 1 stage longer than the FP pipeline of K8. (see page 24)
It's effect should be negligible. But the 64bit/cycle store bandwidth (or 128bit/2 cycles) might prove more influential as discussed in the original aceshardware thread. It would affect even simple SSE copy loops (think of Sandra's cache bandwidth measurements).
March 21, 2007 10:35:07 AM

Which means AMD keeps the FP crown. Nothing much that can be considered new to anyone then.
March 21, 2007 11:46:29 AM

Quote:
Which means AMD keeps the FP crown. Nothing much that can be considered new to anyone then.
It's a limitation, not an advantage. SSE based copy loops will run twice as fast on C2D.
March 21, 2007 12:20:33 PM

English please! Does this mean the Barcelona will be a self-learning cybernetic organism I could use to whack Conroe?
March 21, 2007 1:52:01 PM

no, but it'll wack your budget
March 21, 2007 2:16:37 PM

Quote:
no, but it'll wack your budget
:lol: 
March 21, 2007 4:02:34 PM

Quote:
Actually, there is a link in plane sight to a 37 page AMD developer pdf, which he got his info from.

10x for the info :)  It was informative. Now we know that K10 FP pipeline is 1 stage longer than the FP pipeline of K8. (see page 24)
It's effect should be negligible. But the 64bit/cycle store bandwidth (or 128bit/2 cycles) might prove more influential as discussed in the original aceshardware thread. It would affect even simple SSE copy loops (think of Sandra's cache bandwidth measurements).


I think you read that wrong. It says the data is transferred in 128bit blocks and decoded to 2 64 bit chnks which can both be written at the same time as K10 has two store ports (page 25).

The same page says it will be better to use SSE copy loops because of the aditional bandwidth.



[/quote]
March 21, 2007 7:16:32 PM

PAGE 24:

Quote:
SSE128 adds an additional register read pipe stage
- Only impacts floating point pipeline
- Adds a cycle to FP load latency


The FP pipeline is longer for 1 more stage, 18 stages compared to 17 of K8. Which part you don't understand?
!