The "detail" of Barcelona

abinstein

Distinguished
Jan 15, 2007
48
0
18,530
In this thread, some (gOJDO) claimed that "We have not seen any details about the microarchitecture yet," and question my comment of Barcelona's cache design being "brilliant," some even wished people with different understand/opinion to just "go away."

Unfortunately, we do know some detail of Barcelona; on paper, it looks no less than any Core 2 "detail" you were fed. Furthermore, coincidently, I am not the only who calls Barcelona "brilliant." Some people on this forum gotta learn that the world is much greater than his forum buddies. And frankly, people may decide to go away (Baron?), but truth doesn't.

As I've stated early, we haven't seen Barcelona fully benchmarked, thus AMD's 40% over Clovertown (at the same price&power, I presume) is just a bold claim. Yet given what is revealed about Barcelona's microarch, I'd certainly look forward to its release more than I did to Core 2's (which I got one working last fall and wasn't that much impressed).
 

shinigamiX

Distinguished
Jan 8, 2006
1,107
0
19,280
We've all seen that article before, and frankly, while I don't doubt that Barcelona will be a work of art, I think that guy needs to tone down the hyperbole. "Light the afterburners"? give me a break.
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
In this thread, some (gOJDO) claimed that "We have not seen any details about the microarchitecture yet," and question my comment of Barcelona's cache design being "brilliant," some even wished people with different understand/opinion to just "go away."

Unfortunately, we do know some detail of Barcelona; on paper, it looks no less than any Core 2 "detail" you were fed. Furthermore, coincidently, I am not the only who calls Barcelona "brilliant." Some people on this forum gotta learn that the world is much greater than his forum buddies. And frankly, people may decide to go away (Baron?), but truth doesn't.

As I've stated early, we haven't seen Barcelona fully benchmarked, thus AMD's 40% over Clovertown (at the same price&power, I presume) is just a bold claim. Yet given what is revealed about Barcelona's microarch, I'd certainly look forward to its release more than I did to Core 2's (which I got one working last fall and wasn't that much impressed).


I doubt it. Someone has to keep things honest around here. gOJDO is FORTUNATELY not indicative of Forumz members.

As far as Barcelona, I posted a link to XBit where they have one of the best analyses of Barcelona architecture.

Larger TLBs for data and instructions
Better prefetch and Branch predictor
256bit L1-L2
2x128 loads per cycle
2x128 SSE4A
2x128 DP FP
Sideband stack optimizer
OoO (stores or loads)
new instructions


That's just the tip of the iceberg. I keep saying this thing is a beast.
 

Wombat2

Distinguished
Jul 17, 2006
518
0
18,980
That's just the tip of the iceberg. I keep saying 4x4 is a beast.
^^^ You two months ago ... and yet you still havn't bought one :lol:

Why is AMD too scared to release a real Barcelona benchmark 3 months before launch?

Intel knew they had a killer arch in Conroe and hence they were happy for benchmarks to "leak" well in advance of its launch. They had nothing to hide ... quite the contrary.
 

r0ck

Distinguished
Oct 8, 2006
469
0
18,780
2eq51k5.jpg

4d6wjmo.jpg
 

abinstein

Distinguished
Jan 15, 2007
48
0
18,530
We've all seen that article before, and frankly, while I don't doubt that Barcelona will be a work of art, I think that guy needs to tone down the hyperbole. "Light the afterburners"? give me a break.

Yeah, he could've have used a more neutral wording; besides, I'm not sure the nested page table is new in Barcelona? I think it's already specified in AMD VT and available in Windsor. Maybe Barcelona has better hardware support to it, I don't know.

But please note his focus of praising Barcelona, aside from the nested page table, being 1) power efficiency, 2) shared L3 cache. He is quite correct to point out the similarity of Barcelona's L3 with that in IBM Power5 (except the latter resides on a separate die and is much larger). I link the article only to show what people in the know would acknowledge as "brilliant."
 

r0ck

Distinguished
Oct 8, 2006
469
0
18,780
http://www.infoworld.com/article/06/12/06/50OPcurve_1.html

Uh huh, this is coming from an AMD shill that says..

In high-demand scenarios, Core 2 Duo will go round robin, or sequential, with cores’ access to memory, peripherals and each other, a trait that suits Core 2 Duo more for multitasking than megatasking. I’ll fetch up some numbers to support that take, but I feel it just driving Quad FX around with Vista, and you will, too, if you’re “megatasking .”

Quad FX points the way to the next horizon of commercial client systems. What AMD is showing now as a rip-Intel-a-new-one mean, but not particularly green desktop platform will emerge with balance when it’s tuned for widespread commercial use. Wait if you want to, but I’ll tell you this: If you take your first drive of 64-bit Vista on Quad FX, you may move up to Opteron, but you will never go back to Intel.


His latest article "AMD reinvents the x86" seems to be a spoon fed press kit being repeated across several media outlets.
 

boduke

Distinguished
Oct 25, 2006
410
0
18,780
http://www.infoworld.com/article/06/12/06/50OPcurve_1.html

Uh huh, this is coming from an AMD shill that says..

In high-demand scenarios, Core 2 Duo will go round robin, or sequential, with cores’ access to memory, peripherals and each other, a trait that suits Core 2 Duo more for multitasking than megatasking. I’ll fetch up some numbers to support that take, but I feel it just driving Quad FX around with Vista, and you will, too, if you’re “megatasking .”

Quad FX points the way to the next horizon of commercial client systems. What AMD is showing now as a rip-Intel-a-new-one mean, but not particularly green desktop platform will emerge with balance when it’s tuned for widespread commercial use. Wait if you want to, but I’ll tell you this: If you take your first drive of 64-bit Vista on Quad FX, you may move up to Opteron, but you will never go back to Intel.


His latest article "AMD reinvents the x86" seems to be a spoon fed press kit being repeated across several media outlets.

Ok - I'll retract my earlier question... ;)
 

CaptRobertApril

Distinguished
Dec 5, 2006
2,205
0
19,780
His latest article "AMD reinvents the x86" seems to be a spoon fed press kit being repeated across several media outlets.

Now we know why AMD is so short of cash. They're spending millions on buying journalists. Yeah, Tom Yager is a HO! If Infoworld's Editors had a shred of integrity they would have fired his a$$ months ago.
 

CaptRobertApril

Distinguished
Dec 5, 2006
2,205
0
19,780
AMD is only short of cash because they just purchased a 5 billion dollar company. They are making big gains in markets other than cpu's. They are not just focused on cpu's.

If they cut their journalistic pimp budget they'd be a 6 billion dollar company. 8)
 

abinstein

Distinguished
Jan 15, 2007
48
0
18,530
In this thread, some (gOJDO) claimed that "We have not seen any details about the microarchitecture yet," and question my comment of Barcelona's cache design being "brilliant," some even wished people with different understand/opinion to just "go away."

Unfortunately, we do know some detail of Barcelona; on paper, it looks no less than any Core 2 "detail" you were fed. Furthermore, coincidently, I am not the only who calls Barcelona "brilliant." Some people on this forum gotta learn that the world is much greater than his forum buddies. And frankly, people may decide to go away (Baron?), but truth doesn't.

I doubt it. Someone has to keep things honest around here. gOJDO is FORTUNATELY not indicative of Forumz members.


Baron, please tell me, how many members on this thread is posting materials? A very poor few, sadly, with most others making pointless chats. A closer comparison of Barcelona's and Clovertown/Penryn's microarchitectures or cache or TLB designs would've been nice. But they don't. They mock, they personal attack.

I have to pay respect to shiningamiX and zarooch for their nice behaviors! Other than this, I've totally given up. I could only the others, sayonara, it's been a waste of my time.
 

CaptRobertApril

Distinguished
Dec 5, 2006
2,205
0
19,780
is that your picture from the kiddie pron sting?

Yeah, but Mr. Chris Hansen, I SWEAR the chick said she was 18. And I was just coming over to hang out. And nothing was gonna happen. We were just gonna eat Skittles and watch the Disney Channel! 8)
 

gOJDO

Distinguished
Mar 16, 2006
2,309
1
19,780
I am not sure if the purpose of this thread is to challenge me or to provide some useful informations about Barcelona...
If the second is the case then:
* Comprehensive Upgrades for SSE
- Dual 128-bit SSE dataflow
- Up to 4 dual precision FP OPS/cycle
- Dual 128-bit loads per cycle
- Can perform SSE MOVs in the FP “store” pipe
- Execute two generic SSE ops + SSE MOV each cycle (+ two 128-bit SSE loads)
- FP Scheduler can hold 36 Dedicated x 128-bit ops
- SSE Unaligned Load-Execute mode
Remove alignment requirements for SSE ld-op instructions
Eliminate awkward pairs of separate load and compute instructions
To improve instruction packing and decoding efficiency
* Advanced branch prediction
- Dedicated 512-entry Indirect Predictor
- Double return stacksize
- More branch history bits and improved branch hashing
* 32B instruction fetch
- Benefits integer code too
- Reduced split-fetch instruction cases
* Sideband Stack Optimizer
- Perform stack adjustments for PUSH/POP operations “on the side”
- Stack adjustments don’t occupy functional unit bandwidth
- Breaks serial dependence chains for consecutive PUSH/POPs
* Out-of-order load execution
- New technology allows load instructions to bypass:
Other loads
Other stores which are known not to alias with the load
- Significantly mitigates L2 cache latency
* TLB Optimisations
- Support for 1G pages
- 48bit physical address
- Larger TLBs key for:
Virtualized workloads
Large-footprint databases and
transaction processing
- DTLB:
Fully-associative 48-way TLB (4K, 2M, 1G)
Backed by L2 TLBs: 512 x 4K, 128 x 2M
- ITLB:
16 x 2M entries
* Data-dependent divide latency
* More Fastpath instructions
– CALL and RET-Imm instructions
– Data movement between FP & INT
* Bit Manipulation extensions
- LZCNT/POPCNT
* SSE extensions
- EXTRQ/INSERTQ,
- MOVNTSD/MOVNTSS
* Independent DRAM controllers
- Concurrency
- More DRAM banks reduces page conflicts
- Longer burst length improves command efficiency
* Optimized DRAM paging
- Increase page hits
- Decrease page conflicts
* History-based pattern predictor
* Re-architect NB for higher BW
- Increase buffer sizes
- Optimize schedulers
- Ready to support future DRAM technologies
* Write bursting
- Minimize Rd/Wr Turnaround
* DRAM prefetcher
- Track positive and negative, unit and non-unit strides
- Dedicated buffer for prefetched data
- Aggressively fill idle DRAM cycles
* Core prefetchers
- DC Prefetcher fills directly to L1 Cache
- IC Prefetcher more flexible
2 outstanding requests to any address
* Shared L3
- Victim-cache architecture maximizes efficiency of cache hierarchy
- Fills from L3 leave likely shared lines in the L3
- Sharing-aware replacement policy
This part from this thread is worth and informative(good find BTW). The other link is a pure BS.
Let me add something to the list of details:
* SSE4A instructions:
- MWAIT & MONITOR instructions
* Additional Features:
- Power management state invariant time stamp counter (TSC)
* ODMC enhancements:
- Dual Channel unbuffered 1066 support(applies to socket AM2+ and s1207+ QFX only)
- Write Burst & DRAM prefetching performance improovements
- DRAM writes can be buffered in the memory controller before being opportunistically bursted into DRAM controler to improve DRAM interface efficiency
- Read prefetcher detects stride paterns and issues prefetch requests based on confidence level
- Channel Interleaving
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
That's just the tip of the iceberg. I keep saying 4x4 is a beast.
^^^ You two months ago ... and yet you still havn't bought one :lol:

Why is AMD too scared to release a real Barcelona benchmark 3 months before launch?

Intel knew they had a killer arch in Conroe and hence they were happy for benchmarks to "leak" well in advance of its launch. They had nothing to hide ... quite the contrary.


OK, folks, we have another individual who will be searching for non-existent quotes from me.

I said QFX will close the gap at the high end and it did in price/perf. I don't think fear has to do with CPU releases. AMD has JUST finished the ATi acquisition while finalizing Barcelona. The Core 2 benches were 3 months removed from release.

I expect the first ones by the end of the month. What you have to remember is that AMD will be pushing the SERVER aspects first, like SPEC and TPC. Opteron shows that it will trickle down nicely to the desktop and mobile.

After all, as one overzealous journalist stated, Barcelona is the first Torrenza part.
 

Slobogob

Distinguished
Aug 10, 2006
1,431
0
19,280
I said QFX will close the gap at the high end and it did in price/perf.
I am not certain that my understanding of this statement is correct. Do you mean "closing the gap" as in becoming competitive or as in having a product in the same market?

After all, as one overzealous journalist stated, Barcelona is the first Torrenza part.

I´m eager to see what AMD can do with that! :D
 

Sirfiroth

Distinguished
Dec 31, 2006
136
0
18,680
There are bits and pieces of information on the architecture of Barcelona all over the internet.

I don't remember Intel leaking any architecture before Conroes release.

This was from an article posted on Tom Chris’s AMD Zone.

Each of Barcelona’s four cores incorporates a new vector math unit referred to as SSE128 (128-bit streaming single-instruction-multiple-data extensions). I am aware that you only do quantum physics on weekends, but the potential for hardcore IT tasks such as encryption, compression, real-time analysis of high volumes of streaming business transactions, and wire-speed packet analysis is also the stuff of science fiction. Barcelona gives floating point operations their own schedulers (checkout lanes) and runs them twice as fast as 64-bit SSE did. AMD claims that Barcelona’s per-core floating point performance is more than 80 percent faster than the present Opteron. Benchmark that. And separating integer and floating-point schedulers also accelerates this thing called virtualization, which you may notice is a recurring theme for Barcelona.

What does this mean, Is this just more propaganda from AMD or might there really be something to this? Benchies should start showing up in about 3 more weeks. Until then I will reserve judgement.

Any comments.

"Jumping Jack", your expertise is required here. From an Engineers viewpoint what are the possibilities?
 

tamalero

Distinguished
Oct 25, 2006
1,134
140
19,470
That's just the tip of the iceberg. I keep saying 4x4 is a beast.
^^^ You two months ago ... and yet you still havn't bought one :lol:

Why is AMD too scared to release a real Barcelona benchmark 3 months before launch?

Intel knew they had a killer arch in Conroe and hence they were happy for benchmarks to "leak" well in advance of its launch. They had nothing to hide ... quite the contrary.

so you want someone to scream loudly a la intelway? lastime I seen Amd was not intel, and products do speak of AMD, not their propaganda or marketting team like intel's.