Sign in with
Sign up | Sign in
Your question

Barton with -640k- cache?

Tags:
  • CPUs
  • Core
  • Cache
  • Product
Last response: in CPUs
Share
February 22, 2003 11:36:04 PM

Hey hey..

I found a few websites on pricewatch.com that are selling some 2800 and 3000+ Bartons with 640k cache. I thought Barton had 512k?

Are there two different versions of the barton core now?

~m0j0

More about : barton 640k cache

February 22, 2003 11:39:47 PM

No, you see, they do have 640kb cache, 512kb L2 cache, and 128kb L1 cache

Instead of Rdram, why not just merge 4 Sdram channels...
February 22, 2003 11:41:43 PM

Ah okay, but the Barton 2500+ only has 512k, right?

~m0j0
Related resources
February 22, 2003 11:44:11 PM

Err, no, you're missing the point, ALL the Athlons have 128kb L1 cache, the Palomino and Thouroughbred A and B have 256kb L2 cache. So all the "old" athlons have 384kb cache total, while the Barton has 640kb cache

Instead of Rdram, why not just merge 4 Sdram channels...
February 22, 2003 11:47:22 PM

Thunderbird also has 256, tho mine's a tad weird and has 384 l2.

Hilbert space is a big place.
February 22, 2003 11:50:14 PM

AHHHH yes... It all makes sense now

Think skull, ya know. ;) 

Thanks for clearing that up fer me.

~m0j0
February 23, 2003 1:21:31 AM

That's probably just a read error. Of course, if you're really so crazy as to believe it, you can always dismantle the system, and measure the die size.

--
This post is brought to you by Eden, on a Via Eden, in the garden of Eden. :smile:
February 23, 2003 5:33:33 AM

he doesn't

<font color=orange><b>as you get older, your hard drive becomes floppy, but don't fear viagra is here. viagra puts the hard back in your drive!!!
February 23, 2003 4:31:45 PM

me sar gonna go check later. (Dismantalage will come after I get back from nyc)

Hilbert space is a big place.
February 23, 2003 5:31:18 PM

I believe AMD uses what is called exclusive cache architecture, while Intel uses what is called inclusive cache architecture. In exclusive cache architecture, the information in the L1 is not duplicated in the L2 and thusly covers more mappings. It is approximately equivalent to a L1+L2 size L2. In inclusive cache architecture, cache information in the L1 must also exist within the L2 and thusly, duplicated information means less possible mappings.

Dichromatic for your viewing plesure...
February 23, 2003 6:35:50 PM

But why should data be duplicated at all if the main one in the L1 can be fetched much faster, and hence the need to look for it in L2 is not there?

Heck, even after reading Ars' long cache article, I still don't get why L2 isn't just merged with L1 and L1's latency?


--
This post is brought to you by Eden, on a Via Eden, in the garden of Eden. :smile:
February 23, 2003 6:52:07 PM

For one the P4 has that Frankinmicrocodecache. (nothing against it and it performs fine just a description) The larger the chunk of memory the more address lines you have to weave out to address the data, which in turn makes it harder it is to maintain latency. It generally has to do with what algorithms you use to maintain the coherency of the cache.

I have to go play hockey and then go see the Wiggles, so further discussion will have to wait until later. At least with I.

Dichromatic for your viewing plesure...
February 23, 2003 8:59:30 PM

Inclusive caching architectures have certain advantages. For one, in an exclusive caching architecture you're have 2 pockets of cache to deal with now. When you update information (that was in memory and now in cache), you have to first look in the L2 cache, see if there's information to update, then go into the L1 cache, and see if the information is there to be updated. It's a lot of management that really makes searching and updating very difficult (with latency being the cost). In an inclusive caching system, you just have 1 big pool of cache to deal with. If the information is in that small part of cache that counts as L1, you just happen to transfer it faster than if it were in the other parts of cache.
I read somewhere else (forgot exactly) that an inclusive caching architecture works best if the L2 cache is 7x or more greater than the L1 cache (data and instruction combined). At 7x or greater, the caching space you loose due to the L1 being imprinted onto the L2 isn't as dramatic compared to the latency and management speedups you get from just dealing with 1 pool of cache. The Athlon has 128KB total of L1 cache. The L2 cache at 512KB is 4x that size. It makes sense that it's exclusive. The P4's L1 data cache, on the other hand, is 8 KB. It's L2 cache is 512KB (use to be 256KB). That's 64x the size of the L1 (use to be 32x). It makes sense to have it be inclusive. Even with the P3, with 32KB total of L1 cache, it made sense to have it be inclusive, since the L2 cache of 256KB was still 8x bigger (the original P3/P2 L2 cache was 512KB before they put it on die).
What makes me wonder is with Hammer, when the'd have 1 MB of L2 cache on-die, if the L1 cache stays at 128KB total, would they keep the exclusive architecture? I mean, the L2 cache would then be 8x greater than the L1. Of course, since there will be variants of Hammer with 256KB and 512KB of L2 cache, I guess they will keep the exclusive architecture.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
February 24, 2003 3:58:11 AM

Quote:
For one, in an exclusive caching architecture you're have 2 pockets of cache to deal with now. When you update information (that was in memory and now in cache), you have to first look in the L2 cache, see if there's information to update, then go into the L1 cache, and see if the information is there to be updated. It's a lot of management that really makes searching and updating very difficult (with latency being the cost).

This does not seem right to me. If you have a request for a chunk of memory in an exclusive cache system, the first place the processor looks is the L1 cache. If it is there no further searches are done. If the current association(s) (On an Athlon there are 2) do not match, the associations of the L2 (there are 16 on an Athlon) are checked, if the association is there the eldest association of the L1 is exchanged with the needed L2 association. Otherwise the cache line is loaded from main memory and the eldest association from the L1 is moved to the L2 taking the eldest association from there and thusly retiring that eldest line.

The inclusive cache search works similarly, except rather than exchanging the lines between the L1 and L2 the L1 cache lines are discarded after synchronization between their two states.

I'm not an authority on the nuances of cache architecture, but it seems to me that never having duplicate data in a cache has some advantages for maintaining coherencies. Synchronization between the L1 and L2 is not necessary as there are no conflicting states.

Dichromatic for your viewing plesure...
February 24, 2003 10:49:22 AM

You just really answered your own question. Two separate levels of cache have to be checked first for a state update. If the data is not in the L1 cache, it then checks the L2 cache. If the data is in the L2 cache, it has to transfer data between the L2 cache and L1 cache, write the data in the L1 cache to L2. If information in memory is updated, it checks whether the data is in the L2 cache or not. If it's not in L2 cache, the processor then has to check the L1 cache to see if the data is in there to be updated. It's 2 separate levels of cache to check for synchronization whenever you have a cache miss, which can be pretty dramatic.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
February 24, 2003 4:49:10 PM

The order of search for inclusive/exclusive cache is identical. I think you're missing that. The retirement strategy for both is almost identical. In general you don't pay a penalty for transferring data between the L1/L2. The only real difference is you have relatively more memory to search/flush when main memory must be synchronized. Most of the time memory is only flushed to main memory as it retires.

Dichromatic for your viewing plesure...
February 24, 2003 6:51:46 PM

It takes the P4's L2 cache 7 cycles to transfer a 64-byte cacheline to the L1 data cache. How is that "not a lot of time"? The pre-fetch algorithm will often do this. However, in the case of cache being updated for the memory state, the L1 and L2 cache update don't have to be done separately. All information can be copied to L2 cache and the prefetch algorithm can handle which information or what is loaded into the L1 mapped section.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
February 24, 2003 8:34:04 PM

You can't measure the tick count from L1 to L2 as it depends on the processors multiplier. The P4's cache architecture is unique, as it is designed to often write directly into the L2 (SSE execution). A P3 would be a better comparison and as inclusive caches go, the P3 had an excellent 288bit(256 effective) wide internal cache transfer, with a relative 3cycle ~3ns @1ghz turnaround. An Athlon has a 4cycle L1 turnaround. The P4 has a 2cycle L1 turnaround and a much faster L2 latency due to its 6ns-4ns 400-533mhz FSB.

What do you mean by update for the memory state? The processor either retires a line or flushes the cache. An inclusive cache has to synchronize state between its L1-L2, as altered lines in the L1 have to update their equivalent L2 counterpart.

Dichromatic for your viewing plesure...
February 25, 2003 3:57:54 AM

The L1 and L2 cache on modern processors both run at core clockspeed, so what multiplier would you be talking about? Transfers between the L2 and L1 data cache are measured in clockcycles and a 64-byte cacheline is transfered (half a cacheline between the L2 and main memory).
An update to memory state would be from the perspective of the pre-fetch algorithms. As memory is done through a write through retirement, the pre-fetch algorithm can quickly replace certain cachelines with others without breaking the associativity. There is also no correlation between the blocks written in L1 and L2. That is, the information that was dumped from the L2 in favor of pre-fetched information could still be in the L1, or vice versa. This is extra management and work for the pre-fetch algorithm.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
February 25, 2003 4:53:56 AM

You're definitely right about the 7 tick L1/L2 latency. Sorry about that.

Dichromatic for your viewing plesure...
!