Sign in with
Sign up | Sign in

Athlon II Or Phenom II: Does Your CPU Need L3 Cache?

Athlon II Or Phenom II: Does Your CPU Need L3 Cache?
By

It makes sense to equip multi-core processors with a dedicated memory utilized jointly by all available cores. In this role, fast third-level cache (L3) can accelerate access to frequently needed data. Cores should not revert to accessing the slower main memory (RAM) whenever possible.

That’s the theory, at least. AMD’s recent launch of the Athlon II X4, which is fundamentally a Phenom II X4 without the L3, implies that the tertiary cache may not always be necessary. We decided to do an apples to apples comparison using both options and find out.

How Cache Works

Before diving deeper into our tests, it’s important to understand some basics. The principle of caches is rather simple. They buffer data as close as possible to the processing core(s) in order to avoid the CPU having to access the data from more distant, slower memory sources. Today’s desktop platform cache hierarchies consist of three cache levels before reaching system memory access. The second and especially the third levels aren’t just for data buffering. Their purpose is also to prevent choking the CPU bus with unnecessary data exchange traffic between cores.

Cache Hit/Miss

The effectiveness of a cache architecture is measured by its hit rate. Data requests that can be answered within a given cache are referred to as hits. If that cache doesn’t contain the sought data and must pass the request on to subsequent memory structures, this is a miss. Obviously, misses are slow. They lead to stalls in the execution pipeline and introduce wait periods. Hits, on the other hand, help sustain maximum performance.

Cache Writes, Exclusivity, Coherency

Replacement policies dictate how room is created in a full cache for new cache entries. Since data written into a cache eventually has to be available in the main memory, systems can either do this at the same time (write-through) or mark overwritten locations as “dirty” (write-back) and execute the write once the data is wiped out of the cache.

Data on several levels of cache can be stored exclusively, meaning that no redundancy exists. You won’t find the same piece of data in two different cache structures. Alternatively, caches can operate in an inclusive manner, with lower levels guaranteed to hold the data found in higher-levels (closer to the processor) of cache. AMD’s Phenom works with an exclusive L3 cache, while Intel follows the inclusive cache strategy. Coherency protocols take care of maintaining data across multiple levels, cores, and even processors.

Cache Capacity

Larger caches can buffer more data, but they also tend to introduce higher latency. Since cache also consumes large amounts of a processor’s transistors, it is important to find a viable balance between transistor cost and die size, power consumption, and performance/latency issues.

Associativity

RAM entries can either be direct-mapped, meaning that there can only be one position in a cache for copies of main memory, or they may be n-way associative, which stands for n possible positions in the cache to store data. Higher associativity (up to fully associative caches) provide the best caching flexibility because existing cache data doesn’t have to be overwritten. In other words, high n-way associativity guarantees higher hit rates, but it introduces more latency, since it takes more time to compare all of those associations for hits. Ultimately, it makes sense to implement many-way associativity for the last cache level because there’s the most capacity available, and searching beyond that would send the processor out to slower system memory.

Here are some examples: The Core i5 and i7 work with 32KB of 8-way associative L1 data cache and 32KB of 4-way associative L1 instruction cache. Clearly, Intel wants instructions to be available quicker while also maximizing hits on the L1 data cache. Its L2 cache is also 8-way set-associative, while Intel’s L3 cache is even smarter, implementing 16-way associativity to maximize cache hits.

However, AMD follows another strategy on the Phenom II X4 with a 2-way set-associative L1 cache, which offers lower latencies. To compensate for possible misses, it features twice the memory capacity: 64KB data and 64KB instruction cache. The L2 cache is 8-way set-associative, like Intel's design, but AMD’s L3 cache works at 48-way set associativity. None of this can be judged without looking at the entire CPU architecture. Naturally, only the benchmarks results really count, but the whole purpose of this technical excursion is to provide a look into the complexity behind multi-level caching.

Display 90 Comments.
This thread is closed for comments
Top Comments
  • 26 Hide
    timetravelingtrevor , October 6, 2009 6:39 AM
    Its like you guys can read my mind or something. I was just debating this very topic last night when browsing newegg O_o
  • 20 Hide
    Anonymous , October 6, 2009 6:47 AM
    Since latency for accessing memory from RAM is independent of processor speed, the difference L3 cache makes should be much more significant at higher clock speeds. It would have been nice to see graphs also for an overclocked Athlon and matching Phenom clocks.
  • 17 Hide
    burnley14 , October 6, 2009 6:30 AM
    Very, very interesting. Some of the stuff at the beginning of the article was a little over my head, but it was still very informative. Good work!
Other Comments
  • 17 Hide
    burnley14 , October 6, 2009 6:30 AM
    Very, very interesting. Some of the stuff at the beginning of the article was a little over my head, but it was still very informative. Good work!
  • 26 Hide
    timetravelingtrevor , October 6, 2009 6:39 AM
    Its like you guys can read my mind or something. I was just debating this very topic last night when browsing newegg O_o
  • 5 Hide
    johnbilicki , October 6, 2009 6:43 AM
    My old socket 754 3200 with 1MB cache (remember 754 did not have dual-channel memory support) slaughtered my socket 939 3500 with 512KB cache processing a 200 megabyte Apache access logs in 15/60 seconds respectively. The extra cache is so worth the extra money; it's not like AMD charges a grand for their CPU's. :D 
  • 20 Hide
    Anonymous , October 6, 2009 6:47 AM
    Since latency for accessing memory from RAM is independent of processor speed, the difference L3 cache makes should be much more significant at higher clock speeds. It would have been nice to see graphs also for an overclocked Athlon and matching Phenom clocks.
  • 2 Hide
    eddieroolz , October 6, 2009 6:48 AM
    A very interesting read. Last term I studied basic architecture design and we just brushed on the idea of caches. Good stuff to keep me occupied as I go to sleep.
  • 3 Hide
    superhighperf , October 6, 2009 7:09 AM
    it would be interesting to see a dollar to dollar comparison of the new line up. like a
    AMD Phenom II X3 710 2.6GHz vs AMD Athlon II X4 620 Propus 2.6GHz
    or
    AMD Phenom II X3 720 2.8GHz vs AMD Athlon II X4 630 Propus 2.8GHz

    there are only a few dollar difference between them and from the looks of this article the athalon should dominate a phenom in the same price class.
  • 2 Hide
    porksmuggler , October 6, 2009 7:18 AM
    Excellent article, really in the spirit of Tom's. Having equal respect for AMD and Intel, it pained me to think AMD had screwed up their margins with the inclusion of such a largely unnecessary L3. Seems they're back on track with Propus.
  • 1 Hide
    falchard , October 6, 2009 7:29 AM
    I think the debate on an l3 cache is over. It just makes sense to use it when you start increasing cores. Having 12 independant l2 caches alone is very ineffecient.
  • 17 Hide
    one-shot , October 6, 2009 7:30 AM
    Quote from article:

    "Finally, it remains to be said that L3 cache memory is imperative if you want to reach the highest performance levels. At the 2.6 GHz clock speed that we benchmarked, it may not be that obvious, but at 3 GHz and up we see the Phenom II scaling much better than the Athlon II X4."

    Will those results be posted to see the actual numbers? I would like to see how absence of L3 cache affects CPU performance at higher frequencies.
  • 6 Hide
    drealar , October 6, 2009 7:53 AM
    Everyone will have their own opinion after reading this article. But I have something to say for sure....

    I'll buy Athlon II X4!! Woohooo!!

    That $50 difference is what I pay for my fuel every month. So that $50 difference mean A LOT to me :D 
  • 1 Hide
    porksmuggler , October 6, 2009 8:02 AM
    oh to only pay $50 a month for fuel, more like well over $100, and that's at 32mpg :( 

    but yeah, the Athlon II X4 is going to be a winner at it's price point.
  • 6 Hide
    Anonymous , October 6, 2009 8:04 AM
    i'm with one-shot, OC the Athlon II X4 to 3.2ghz (if its possible) and compare it with the phenom II at 3.2ghz. Maybe lower the multi on the phenom and OC the FSB of the phenom like you do with the athlon II.
  • 1 Hide
    liemfukliang , October 6, 2009 11:07 AM
    I hate Intel Core. The full fledge intruction is only for ear $300. The newest case is I5 vs I7. I7 has VT-D or something like that, but this instruction is missing in I5. Why? In AMD case low end Athlon to Phenom II has the same intruction.
  • 5 Hide
    anamaniac , October 6, 2009 11:22 AM
    Thank you for a excellent article.

    I've been pondering buying my brother a Athlon X4 system and giving him my 4870. I already decided 4 cores was the target, but this helped me determine which quad core.

    Go AMD!
    I understand it's crippling a CPU, but who doesn't like cheap?

    Also... with the reduced transistor count, does this mean they can add another core or two to the CPU and still fall within the transistor budget?
    6 cores with no L3 or 4 cores with 6MB L3? I'd take the sexa-core.
  • 3 Hide
    Anonymous , October 6, 2009 12:04 PM
    I never thought I would see it but I think Toms may have made a history error. The first use of a L3 cache on an x86 platform that I can remember was the last of the Super Socket 7 processors. The AMD K6-3 came with L1 and L2 on the die and the board had up to 2MB of fast L3. This was the L2 if you installed a K6-2.
  • -1 Hide
    FoShizzleDizzle , October 6, 2009 12:15 PM
    I always had a hunch on the lack of importance of L3 cache, as AMD never put much focus into the 800 series. With 2 megs less at the same clockspeed, the X4 810 doesn't run much if any slower than it's cousin in the 900 series.
  • 2 Hide
    mindless728 , October 6, 2009 12:19 PM
    the phenom II X4's don't start at 3GHz, what about the 810 (4MB L3) and 910 (6MB L3), they are both 2.6GHz stock
  • -1 Hide
    Anonymous , October 6, 2009 12:43 PM
    "The Athlon II X4’s cores, including their L1 and L2 caches, are identical to the Phenom’s. (Ed. Note that this is only the case for early Athlon II X4s. Moving forward, more and more of them will center on a completely different, more economical processor die)"

    Does this imply that future Athlon II X4s may not perform the same as the current batch of Athlon II X4s?
  • 4 Hide
    kerdika , October 6, 2009 12:51 PM
    Im a little concerned that some "not so smart" people may read this article and think that an Athlon II 620 is on par with a Phenom II 965 or even a 940. I would just like to point out that this is L3 vs no L3 NOT A2 vs P2. for a hundred bucks you get 2.6ghz cpu with no l3 and probably a max over clock of 3.0ghz for 70 bucks more you can get a 3.0 ghz cpu with l3 and almost 3.8-4ghz oc. in the future as multi-core computing matures, l3 will become more important. so if you left with out any then you may be buying another cpu down the road,, so much for saving 70 bucks...
Display more comments