Sign in with
Sign up | Sign in
Your question

Does Cache Size Really Boost Performance?

Last response: in CPUs
Share
October 24, 2007 1:33:11 PM

Intel's Core 2 processors run the gamut with 1 MB, 2 MB and 4 MB of second-level cache. After comparing all three options at a 2.4-GHz clock speed, we learned that the importance of L2 cache must not be underestimated.

http://www.tomshardware.com/2007/10/24/does_cache_size_matter/index.html
October 24, 2007 2:20:44 PM

What about the energy consumption and temperature ?

SurJector
October 24, 2007 2:27:20 PM

I liked the article, well thought out and short. It draws clear conclusions from a series of tests. I'm a little disappointed you did not include AMD processors in the article, though, since they are the other half of the processor market (let's be real, VIA is a joke in the grand scheme, except in specialized applications). It would be interesting to see how 2x512MB (Windsor & Brisbane) compare to 2x1024 (Windsor) and even 2x256MB (Windsor EE - the 3600+ Windsor). They make processors with those specs that use multipliers of 10 (10.5 in the Brisbane case, which could be lowered, I believe), so it wouldn't be difficult or time consuming to perform these tests, too. Before the flames start, I know Brisbane and Windsor are made using a different manufacturing tech (90nm vs. 65nm), I'm just curious to see how it impacts the AMD procs.

That would have also been helpful, SurJector.
Related resources
October 24, 2007 2:48:51 PM

Nice read.

More cache = better :D 

Although, the gap between 2mb and 4mb was smaller then 1mb and 2mb, both were double.

4mb to 6mb isnt double so the performance imprvoement beacuse of thsi will probably be very small compared to 1 to 2.
October 24, 2007 2:52:21 PM

the gamming performence will make some progress in more L2 cache.
a b à CPUs
October 24, 2007 2:58:24 PM

It looks like the article has a notable flaw...

* briefly puts on nerd cap *

The hypothesis being tested relates to cache ram size. The author did a good job making sure all 3 CPUs were clocked at the same speed. But it looks like the cache ram for each CPU was operating at 3 different speeds.

If I recall corectly, proper scientific method to test this requires that everything else be constant, and that the cache ram size be the only variable. While I don't exactly know how that could be achieved in this test, it would provide truly accurate results.

I have no doubt the differences in cache ram speed skewed the test results a bit. So the results may not be entirely accurate.

* removes nerd cap, stomps it flat, AND sets it on fire, AND extinguishes flames via 'natural' means *
October 24, 2007 3:17:48 PM

Things would be a bit different for AMD

AMD have always had smaller caches due to manufacturing capacities. The processors from AMD have different designs to mitigate impact of smaller caches. Much of this had been documented very well on Anandtech's in-depth processor architecture reviews.
a b à CPUs
October 24, 2007 3:21:41 PM

Wonder how well the Intel and AMD cpu's compare when the L2 cache is disabled ... anyone ever looked at that?

I recall disabling the L2 cache on my old Cyrix box a few years ago and it ran faster ... lol.

I had to flash the mobo from PCCHIPS with a hand torch ... manually ... after peeling off the EE sticker on the window.

Then wind up the spring and reboot (after gently starting the pendulum).

Sorry ... I'm being too technical ... heh heh.

The nurse is coming with my medication now ... gotta go :) 

October 24, 2007 3:44:36 PM

I would like to see the results for the AMD CPUs. I would also like to know why the authors decided to forego the AMD CPUs for this test.

The results are surprising and I will keep this in mind when I purchase a new processor in the coming months,
October 24, 2007 3:58:29 PM

The article is flawed and biased. I find it strange that the benchmarks that the author posted directly contradict his conclusion. One would think, that if the real world numbers were truly that much different, he would've posted those results, and not ones where it showed, in many cases, less that a 10% difference from top to bottom. Instead he just says, look at the CPU charts... I'm right. OK... but the numbers you just posted say different... huh? It just seems this article is a bit biased, as the author obviously was trying to prove a point up front, then posted data that directly contradicts his hypothesis, then backtracks in his conclusion by using a nebulous reason as to why he is right. Also, the author manages to skew his findings by only using Intel chips that still rely on the FSB. If he had used any AMD chips, one would see that an Integrated Memory Controller puts far less importance on L2 cache than the use of a FSB. I would've liked to have seen a AM2 6000+ Windsor put against a AM2 5000+ Black Edition Brisbane @ 3.0ghz to see the difference between 2x512kb and 2x1MB. But doing so would only further debunk the author's hypothesis.
That being the case... why did he even bother writing the first 7 pages? He could've just written page 8 and the article holds the same weght.
October 24, 2007 4:11:46 PM

Amd has too many variables I think to test it out. There are a lot of differences between them.

Still it would be interesting to see how AMD fights with it's lower cache size and in a few weeks/months how the L3 cache works for the phenom line and the tricores too (if they keep the same L3 cache even though they are losing a core)

The cache size and speed is an interesting subject, I hope we will see more of these.
October 24, 2007 4:21:16 PM

That is true, The Brisbane core has an added latency that the Windors do not, which would skew the test... even if ever-so-slightly... it would still not be completely accurate.
October 24, 2007 4:24:08 PM

“However, the most important benefit is due to how Intel can offer more processor variants with 6 MB, 4 MB, 2 MB or even 1MB L2 cache. In doing so, Intel utilizes an even higher percentage of the dies on a wafer despite some scattered defects that might have forced Intel to throw dies away in the past. “

Funny that’s exactly the same advantage that AMD native quad core has, you can make triple, dual and single core CPUs out of the quad core.
October 24, 2007 4:30:35 PM

@ragemonkey: The author said that there is a performance difference in the three cache sizes. This is a fact. He did not go into the debate of what percentage increase that translated into and whether or not it made financial sense, simply that more cache increased the performance of the system. Stop flaming him. It is up to the reader to decide whether an ~10% increase in performance is worth the money. The general investigation was whether the amount of cache changed the system's performance, or whether clock speed alone determines the level of performance; and conclusions were made that fit the data that was gathered.
October 24, 2007 4:32:30 PM

I am the only one thinking that the only benchmark result that really mattered was WinRAR, and that the difference in games was at such high frame rates that a smooth game play was archived anyway???
I'd challenge the author to try to show that different cache sizes could mean the difference between a smooth game play and a choppy one. Improving your frame rates from 167 to 175 only awards you with boosting rights, which in my account is pretty infantile.
October 24, 2007 4:37:55 PM

@eltoro: You forget that little bit of performance you pay for now could pay dividends when more resource intensive games come out (like 35fps vs 40fps). Buying the largest cache is more for future-proofing than anything else.
October 24, 2007 4:56:51 PM

muk said:
Intel's Core 2 processors run the gamut with 1 MB, 2 MB and 4 MB of second-level cache. After comparing all three options at a 2.4-GHz clock speed, we learned that the importance of L2 cache must not be underestimated.

http://www.tomshardware.com/2007/10/24/does_cache_size_matter/index.html

True but cache must not be overestimated at the cost of higher clock rates. The more cache you have the more heat and the more heat you have the lower clock rates will be. More cache you have the less room for processing structures and less cache you have the more room for processing structures.
October 24, 2007 5:00:07 PM

ragemonkey said:
The article is flawed and biased. I find it strange that the benchmarks that the author posted directly contradict his conclusion. One would think, that if the real world numbers were truly that much different, he would've posted those results, and not ones where it showed, in many cases, less that a 10% difference from top to bottom. Instead he just says, look at the CPU charts... I'm right. OK... but the numbers you just posted say different... huh? It just seems this article is a bit biased, as the author obviously was trying to prove a point up front, then posted data that directly contradicts his hypothesis, then backtracks in his conclusion by using a nebulous reason as to why he is right. Also, the author manages to skew his findings by only using Intel chips that still rely on the FSB. If he had used any AMD chips, one would see that an Integrated Memory Controller puts far less importance on L2 cache than the use of a FSB. I would've liked to have seen a AM2 6000+ Windsor put against a AM2 5000+ Black Edition Brisbane @ 3.0ghz to see the difference between 2x512kb and 2x1MB. But doing so would only further debunk the author's hypothesis.
That being the case... why did he even bother writing the first 7 pages? He could've just written page 8 and the article holds the same weght.

Almost 20 lines without any information or factual points regarding your own argument. A little more quotations, better formating and punctuations would make your argument easier to understand.

You begin with the conclusion instead of outlying a proper argumentation. Instead of quoting the author you go on and spin your own twisted review trying to impersonate the original author coming to the very conclusion you drew out in the first sentence. Basically you do yourself what you are trying to complain about.

Your finding "that an Integrated Memory Controller puts far less importance on L2 cache than the use of a FSB" is interesting yet you fail to provide factual information. Claiming something based on personal feelings or subjective observation does not equal facts. Stating information as a fact does not inherently make it a fact.

While you state that you would have liked to see how other Processors would have acted with different sizes of L2 cache, especially processors with a build in memory controller, you only provide speculative arguments why the author didn't provide that information in his review. Someone interested in a serious argument would have at least tried to see reasons why the author did or didn't include it. You only provide speculative reasons that benefit your own conclusion that is based on your subjective perception of the article.

Since you are unwilling to even see arguments leading to a conclusion different from your own, i conclude that you made up your mind before you read the article.


If i'm wrong and simply failed to comprehend your argument i sincerely apologize and hope you are willing to point out my misunderstanding in a comprehensive way.
On the other hand if you are a "fan" or emotionally attached to a certain brand and simply try to propagate your own personal views as facts or to provoke others, please refrain from posting.
October 24, 2007 5:21:56 PM

yes(period)
October 24, 2007 5:29:50 PM

wow ... I put down my lunch to read this one ...
October 24, 2007 5:30:23 PM

I want... a 1 or 2 gb l2 cache. :cry: 
October 24, 2007 5:38:44 PM

@KyleSTL: lets try a little math...
For example, Quack IV:
174.6 / 167.7 = 1.0411 = 4.11% performance improvement between 2mb and 4mb

lets apply this to a game running at 35fps with a 2mb cache CPU, and we'll get 35 * 1.0411 = 36.44fps

Now, that's hardly an improvement that even deserves the time and attention I'm taking to write this reply. This won't realistically and noticeably improve choppiness.
October 24, 2007 5:41:13 PM

Good "old style" THG article and benchmarks .
Well done :D 
October 24, 2007 5:43:54 PM

elbert said:
True but cache must not be overestimated at the cost of higher clock rates. The more cache you have the more heat and the more heat you have the lower clock rates will be. More cache you have the less room for processing structures and less cache you have the more room for processing structures.


But much of that is not totally relevent to the article or the conclusions made.

1) In regards to processing structures - This would require Intel to maintain totally difference CPUs. In this case we are dealing with identical CPUs, but with optionally different amounts of cache. The E2160 has no more processing structures than the E6850.

2) More Cache does not always mean better OCing. The E6750 will OC much higher than any of the E2xxxx chips due to the newer stepping. Yes, more cache may more more heat with everything held constant. But again, the point of the article was not which CPU could OC the highest or the cause of that.


a c 110 à CPUs
October 24, 2007 5:47:59 PM

maddogfargo said:
It looks like the article has a notable flaw...

* briefly puts on nerd cap *

The hypothesis being tested relates to cache ram size. The author did a good job making sure all 3 CPUs were clocked at the same speed. But it looks like the cache ram for each CPU was operating at 3 different speeds.

If I recall corectly, proper scientific method to test this requires that everything else be constant, and that the cache ram size be the only variable. While I don't exactly know how that could be achieved in this test, it would provide truly accurate results.

I have no doubt the differences in cache ram speed skewed the test results a bit. So the results may not be entirely accurate.

* removes nerd cap, stomps it flat, AND sets it on fire, AND extinguishes flames via 'natural' means *


What the crazy dawg in the funny geek hat said. Without accounting for the difference in L2 cache speed between the E2160, the E4400 and the X6800 any claims to cache size boosting performance is highly dubious.
October 24, 2007 6:06:05 PM

eltoro said:
@KyleSTL: lets try a little math...
For example, Quack IV:
174.6 / 167.7 = 1.0411 = 4.11% performance improvement between 2mb and 4mb

lets apply this to a game running at 35fps with a 2mb cache CPU, and we'll get 35 * 1.0411 = 36.44fps

Now, that's hardly an improvement that even deserves the time and attention I'm taking to write this reply. This won't realistically and noticeably improve choppiness.

In general i share your view on it, i have to point out that you are making an assumption. Applying your math would imply that the effect of L2 cache scales linear. To make such an assumption you would need more data.
In addition you can't directly compare the performance of Quake 4 with any other game.

That said, i still think you're right.

Still our point of view is based on subjective perception and speculation. What if the performance impact or L2 grows or declines stronger or weaker with clock speeds?

Another aspect, and the only point i'd like to criticize about the article, is the relation between L2 and system memory. Since the L2 is cache to save access time and reduce the amount of accesses it would be very interesting to see the effects of that.

If i, for example, compare a 4MB L2 processor using 800Mhz DDRII with a 1MB L2 processor using 800Mhz DDRII and set that data against a 4MB L2 processor using 533Mhz DDRII and a 1MB L2 processor using 533Mhz DDRII. The same set of tests could be done with DDR3 and DDR1 to see how the L2 effects the performance with faster or slower memory.
I would speculate that larger L2 cache somewhat can make up for slower memory - at least to a small degree. It is only speculation though.
In my eyes that is the only shortcoming of this otherwise pretty nice article.
October 24, 2007 6:08:41 PM

@eltoro: I see your point, but the difference between 1MB and 4MB is much greater (9.9% for Quake IV, 8.9% for Prey, and a 13.8% reduction in time for processing the WinRar bench). I realize the difference is not that great, but there is a difference, that's all I'm saying. We both have valid arguments. Let's leave it at that.

Also note that COD2 has no preference for a small or large cache. Future games may have similar performance, or maybe the games will have vast improvements due to the cache size, we don't know.
October 24, 2007 6:10:38 PM

ragemonkey said:
The article is flawed and biased. I find it strange that the benchmarks that the author posted directly contradict his conclusion. One would think, that if the real world numbers were truly that much different, he would've posted those results, and not ones where it showed, in many cases, less that a 10% difference from top to bottom.



The benchmarks don't directly contradict the conclusion. The 1MB would have to outscore the 2 and 4 for that to be a direct contradiction. They just aren't as supportive as what they make them out to be, and so the biased conclusion comes in. When I saw the charts I was thinking...thats not a lot. 15 fps difference from 1MB to 4MB in Quake? Yes there was a boost but come on, that is splitting hairs. Keep in mind there is an error margin too because the tests are only so accurate. No one is going to go upgrade their CPU for 15 fps. Interesting study but the conclusion is unrealistic.

Oh yeah, quick calculation that they should have included / discussed -> Between the three games, the average FPS boost from 1MB to 4MB was only 9 FPS. That is, on average based on their study, 3 FPS / MB of cache increase. Sorry, I am only looking at FPS, I don't care about Divx that much :) 
October 24, 2007 6:10:42 PM

KyleSTL said:
@eltoro: I see your point, but the difference between 1MB and 4MB is much greater (9.9% for Quake IV, 8.9% for Prey, and a 13.8% reduction in time for processing the WinRar bench). I realize the difference is not that great, but there is a difference, that's all I'm saying. We both have valid arguments. Let's leave it at that.

Also note that COD2 has no preference for a small or large cache. Future games may have similar performance, or maybe the games will have vast improvements due to the cache size, we don't know.


What about UT3 and COD4?

Word, Playa.
October 24, 2007 6:13:53 PM

Doesn't L2 run at core speed or FSB speed? :-/
October 24, 2007 6:21:23 PM

leo2kp said:
Doesn't L2 run at core speed or FSB speed? :-/


Core in this particular case later generations of CPU's it was 1/2 speed.

Word, Playa.
October 24, 2007 6:23:44 PM

@leo2kp & madfrog: I was pretty sure the L2 ran syncronously with the FSB or Bus speed or something, so if that's the case, then running all three processors in 1066FSB would make all cache speeds equal. Someone correct if I am wrong.

Edit, already answered, thanks, Playa.
October 24, 2007 7:11:47 PM

Its well known that extra L2 equals better performance. Still, its also known that higher clock speeds equal better performance. You can have one or the other, but you cannot adjust L2, you can adjust clock speeds.

I'd rather get an E4400 and overclock it to 3.33 gigahertz than have a E6750 with 2 more megs of L2.
October 24, 2007 7:13:55 PM

all in all, I don't think anyone would argue that the relative effectiveness of cache is dependent on
1) architecture, e.g. effectiveness of branch predictions
2) its relative speed with respect to general purpose memory access speed.

The article did not address either of these, which I think prompted the bulk of the criticism here. Only a single architecture was examined, but even there the FSB speed was not fixed, as noted, whcih might influence the results.

Also, the code determines a lot of the effectiveness, too, which was not discussed in the article. In fact, that would be the explanation to the question why some applicatinos show more improvement than others.

By the way, to me, the largest area of concern is not that the authors did not test for all possible scenarios, but that they apparently have no understanding of the underlying concepts (maybe I'm wrong and they do, but they certainly don't explain it in the article)
October 24, 2007 7:30:33 PM

Slobogob wrote "Since the L2 is cache to save access time and reduce the amount of accesses it would be very interesting to see the effects of that."

Yes, that is what cache is all about. Larger cache size mean fewer cache misses, until the memory working set fits in the cache. After that, an even larger cache doesn't help.

As the working set of Windows and the 50-100 ancillary processes running on your machine, besides your foreground application grows, you will feel the value of cache over raw processor speed.

According to my calculations:

For example, if a cache runs at processor speed, and a cache miss costs 50 processor waits, then going from a 99% hit rate to 100% will give a 49% gain in productive instruction cycles. A hit rate of 92% gives a 10% gain in productive cycles over a 91% hit rate.

If a cache miss costs only 25 processor waits, then a 100% hit rate gives 24% more productive cycles over a 99% hit rate. A hit rate of 92% gives an 8% gain in productive cycles over a 91% hit rate.

The benefits of large caches have been clearly observed when running J2EE web application server benchmarks.

Similarly, try your game benchmarks in the foreground while you are switched away from a large complex web page that constantly updates itself running java and javascript and reloads dozens of elements from the web (that it will find in browser cache on disk)
October 24, 2007 7:41:24 PM

I agree with you on that one Russki. Their conclusion was just a...Yep, it does...with a serious lack of relevant discussion. The magnitude of the performance boost wasn't even discussed...that is a huge no no. I envision interns writing that article for some reason...
October 24, 2007 7:58:03 PM

Once again, no AMD....
October 24, 2007 8:05:11 PM

Sheesh ...
Top three reasons you get what you get.
1. Money ( ie Money)
2. Money (ie Time)
3. Money (ie Material)
October 24, 2007 8:09:03 PM

AMD has one foot in the grave...let em die in peace.
October 24, 2007 8:41:32 PM

I don't know if this is valid or relevant, but I noticed when doing some video encoding on my rig and a friend's laptop C2D (1.8GHz/800MHz/2MB) that although his video encoding was a tad slower than mine, his system felt far more responsive. At the time I noticed it I proposed it was the extra cache. I'd be curious to see multitasking benchmarks.

-mcg
October 24, 2007 8:47:01 PM

Quote:
the day hell freezes over will be the day mr schmid writes an article that peploe don't flame to hell.
...
seriously, though, in all my time visiting toms i have never seen one of his reviews get good support.


Stranger, you don't mean to imply there is some sort of a conspiracy at play here, do you? His articles are borderline passable, for the most part, at the top of the crop of the recent content here at Tom's. Cream of the crap, if you will.

You know, it used to be that Tom's was just as thorough if not more than Anand. Nowadays, this has been reversed in a big way.

I wasn't flaiming his article, really, but there is some content that is missing, you'd have to agree.
October 24, 2007 9:55:10 PM

sirrobin4ever said:
Once again, no AMD....

I second that.
October 24, 2007 9:59:42 PM

nice article THG!
interesting the synthetic bench marks - show little difference
is this because synthetic tests are tuned to show results and are unreliable?
i often wonder about the synthetic tests - if they are really bench marks or just graphic shows?

o why no amd? amd is hurting bad enough at this point no reason to make them look even woarse? is that a flame?
October 24, 2007 10:09:32 PM

dragonsprayer said:
nice article THG!
interesting the synthetic bench marks - show little difference
is this because synthetic tests are tuned to show results and are unreliable?
i often wonder about the synthetic tests - if they are really bench marks or just graphic shows?

o why no amd? amd is hurting bad enough at this point no reason to make them look even woarse? is that a flame?

Nah. It's just that you can't compare AMD's cache with Intel's, which you probably already know and he doesn't :D 
Besides, AMD has fewer performance increases with larger L2 cache I believe.
a c 127 à CPUs
October 24, 2007 11:23:43 PM

Does anyone else here realize that comparing a Core 2 to a AMD K8 series would be just stupid? The K8 series is a roughly 5 year old architecture and is not compareable to the Conroe which is only a year old now. The Phenom would be compareable to Penryn/Conroe.

Besides Tom has the CPU charts where you can see the benchmarks and compare similar clocked CPUs with different cache sizes and so on.

I understand why you want to see AMDs results but this article was about seeing if cache makes a performance difference not whose CPU is better. I think it was interesting to see how real world test the cache did make a difference.

I can tell you cache size does make a difference. My old build had a P4 3.2GHz with 512k L2 cache(Northwood) and upgraded to a (Northwood again) P4EE 3.4GHz with 2MB L2 chace and my dual layer dvd ripping time from 30mins to 7mins. I haven't even tried my new system for dvd ripping yet although I am sure it will be uber fast compared to the old one.

Also if I remember correctly didn't Intel release a P4EE with a L3 cache as well? Was like 2-3 years ago but still.
October 24, 2007 11:39:45 PM

AMD was not included because that would have been a different article.
This focused on the effect of L2 cache on Intel Processors.

It seems people flame just to flame or complain to just complain.
I have not seen any well thought out objection.

Note: The Cache Speeds are the same. (The closest thing to a valid argument, except it's just plain wrong.)

The following is from Intel's Website frm the description of one of the C2D Chips.


"L2 Cache Speed: The speed of the 2nd level cache. Since all current Intel® processors have internal L2 cache, the cache speed will be expressed in MHz/GHz or the speed in relationship to the processor core speed. For example, the Pentium® II processor and some early Pentium III processors had the L2 cache run at half the processor speed, while newer Pentium III processors and Pentium 4 processors have their cache run at the full speed of the processor. "
!