Sign in with
Sign up | Sign in
Your question

Phenom II and i7 <=> cache design for gaming

Last response: in CPUs
Share
August 20, 2009 12:27:57 PM

Phenom II's L3 cache is: 48-way set-associative
i7's L3 cache is: 16-way set-associative
C2Q no L3 cache

http://www.agner.org/optimize/optimizing_cpp.pdf
http://www.agner.org/optimize/#manuals

Quote:
8.2 Cache organization
It is useful to know how a cache is organized if you are making programs that have big data
structures with non-sequential access and you want to prevent cache contention. You may
skip this section if you are satisfied with more heuristic guidelines.

Most caches are organized into lines and sets. Let me explain this with an example. My
example is a cache of 8 kb size with a line size of 64 bytes. Each line covers 64 consecutive
bytes of memory. One kilobyte is 1024 bytes, so we can calculate that the number of lines is
8*1024/64 = 128. These lines are organized as 32 sets × 4 ways. This means that a
particular memory address cannot be loaded into an arbitrary cache line. Only one of the 32
sets can be used, but any of the 4 lines in the set can be used. We can calculate which set
of cache lines to use for a particular memory address by the formula: (set) = (memory
address) / (line size) % (number of sets). Here, / means integer division with truncation, and %
means modulo. For example, if we want to read from memory address a = 10000, then we
have (set) = (10000 / 64) % 32 = 28. This means that a must be read into one of the four
cache lines in set number 28. The calculation becomes easier if we use hexadecimal
numbers because all the numbers are powers of 2. Using hexadecimal numbers, we have a
= 0x2710 and (set) = (0x2710 / 0x40) % 0x20 = 0x1C. Reading or writing a variable from
address 0x2710 will cause the cache to load the entire 64 or 0x40 bytes from address
0x2700 to 0x273F into one of the four cache lines from set 0x1C. If the program afterwards
reads or writes to any other address in this range then the value is already in the cache so
we don't have to wait for another memory access.

Assume that a program reads from address 0x2710 and later reads from addresses
0x2F00, 0x3700, 0x3F00 and 0x4700. These addresses all belong to set number 0x1C.
There are only four cache lines in each set. If the cache always chooses the least recently
used cache line then the line that covered the address range from 0x2700 to 0x273F will be
evicted when we read from 0x4700. Reading again from address 0x2710 will cause a cache
miss. But if the program had read from different addresses with different set values then the
line containing the address range from 0x2700 to 0x273F would still be in the cache. The
problem only occurs because the addresses are spaced a multiple of 0x800 apart. I will call
this distance the critical stride. Variables whose distance in memory is a multiple of the
critical stride will contend for the same cache lines. The critical stride can be calculated as
(critical stride) = (number of sets) × (line size) = (total cache size) / (number of ways).

If a program contains many variables and objects that are scattered around in memory then
there is a risk that several variables happen to be spaced by a multiple of the critical stride
and cause contentions in the data cache. The same can happen in the code cache if there
are many functions scattered around in program memory. If several functions that are used
in the same part of the program happen to be spaced by a multiple of the critical stride then
this can cause contentions in the code cache. The subsequent sections describe various
ways to avoid these problems.


When you play a game and there is a lot of action.
Physics, AI, a lot of verecies drawing the picture and more
Much data and many functions are used to calculate the picture.

Data and functions are scattered in this scenario.

You need a cache that doesn't evict data or code and goes to memory
August 20, 2009 12:29:53 PM

can't edit messages...
August 20, 2009 1:02:54 PM

Oh its Assler what new trolling do you bring to THG this fine day?

Word, Playa.
Related resources
August 20, 2009 1:21:27 PM

I think some here want's to know why phenom sometimes is better on high resolutions.

Intel is fast when data and code isn't that complicated (applications doesn't use a lot of functions and data). On low resolutions intel gains a lot of fps because it is fast where games are performing simpler actions.
If you increase resolution, then the graphiccard will brake the cpu an intel can't perform that well in simpler areas because it has to wait on the gpu

when the game is performing more complicated actions, then the gpu may need to wait on the cpu. and here is where phenom is good because it doesn't evict data as soon as i7 from cache. i7 needs to go to memory sooner compared to phenom.
a b à CPUs
August 20, 2009 5:08:59 PM

You think the Phenom is better on high res?

Let's test that theory:

Crysis:

Looks pretty much like a tie to me...

Far Cry 2:

Well, it's not a tie. Of course, the win isn't going the way you predicted either...

Left 4 Dead:

Dead heat at 2560x1600, the i7 has a slight lead at 1920x1200 (classic sign of GPU bottleneck, especially since the i7 has a huge lead at both resolutions when AA is off)

Call of Duty World at War:

Once again, the i7 has the lead at high resolutions



Hmm...
Seems you're wrong. The i7 leads at all times except when the game is well and truly GPU bottlenecked, in which case it's pretty much a tie (as expected). Add more GPU power and the i7 should pull out ahead again (which is shown in many reviews of high end CF/SLI setups).
August 20, 2009 5:28:02 PM

cjl said:
Hmm...
Seems you're wrong. The i7 leads at all times except when the game is well and truly GPU bottlenecked, in which case it's pretty much a tie (as expected). Add more GPU power and the i7 should pull out ahead again (which is shown in many reviews of high end CF/SLI setups).


Did you understand the text about Cache organization ?
August 20, 2009 8:26:45 PM

cjl said:
You think the Phenom is better on high res?

Let's test that theory:.


Thanks for your post showing that the i7 needs to run at 3.8Ghz against a 3.64Ghz Phenom II to match the speeds.

This contradicts what is commonly believed.
August 20, 2009 8:43:15 PM

yawn...another troll who posts something that he doesn't fully understand.
August 20, 2009 9:17:17 PM

Crysis: complicated game, don't need that much cpu power but probably access a lot of memory because graphics is so detailed. In this game you could probably start to see that phenom performs better on the most heavy sections. The scenario would be that on very low resolutions, i7 is faster. increasing resolution will narrow the gap and when you start to get close to 100% gpu bottleneck phenom passes i7, increasing the resolution even more and the score is evened out because the game is 100% bottlenecked by the gpu.

Far Cry 2: If I remember right this game doesn't seem to be that scattered. Both i7 and C2Q seem to run this game well. Don't know how much cpu power the game needs on high resolutions and how much memory that is used in complicated situations or if it can handle a lot of enemies with advanced AI and physics.

Left 4 dead: game unknown for me.

Call of duty: Not that heavy game for the computer to run. if I am right it is almost single threaded?

In today's game you can run almost all games on a dual core processor and they fit within 2 GB ram I think. They all have one main render thread because DX 9 and 10 don't work well otherwise.

This will change with DX11 and games will probably add much more physics and AI, DX11 is also able to take advantage of more cores for rendering.
Maybe you will see quad CPU's running at 80-90 % on high resolutions calculating huge amounts of data to render the picture. Finding data in the cache will be more and more important in order to gain speed. This is why Phenom II has a better cache design for advanced games.

For applications that may render or do some other long task the situation is different. Even if the task is rather complicated there is probably not that much code involved. Also data can be optimized for maximum performance (often data should be processed in long trains in order to get maximum speed). The cache doesn't need to be that advanced for this type of scenario.
August 20, 2009 11:53:05 PM

kassler said:
Crysis: complicated game, don't need that much cpu power but probably access a lot of memory because graphics is so detailed. In this game you could probably start to see that phenom performs better on the most heavy sections. The scenario would be that on very low resolutions, i7 is faster. increasing resolution will narrow the gap and when you start to get close to 100% gpu bottleneck phenom passes i7, increasing the resolution even more and the score is evened out because the game is 100% bottlenecked by the gpu.

Far Cry 2: If I remember right this game doesn't seem to be that scattered. Both i7 and C2Q seem to run this game well. Don't know how much cpu power the game needs on high resolutions and how much memory that is used in complicated situations or if it can handle a lot of enemies with advanced AI and physics.

Left 4 dead: game unknown for me.

Call of duty: Not that heavy game for the computer to run. if I am right it is almost single threaded?

In today's game you can run almost all games on a dual core processor and they fit within 2 GB ram I think. They all have one main render thread because DX 9 and 10 don't work well otherwise.

This will change with DX11 and games will probably add much more physics and AI, DX11 is also able to take advantage of more cores for rendering.
Maybe you will see quad CPU's running at 80-90 % on high resolutions calculating huge amounts of data to render the picture. Finding data in the cache will be more and more important in order to gain speed. This is why Phenom II has a better cache design for advanced games.

For applications that may render or do some other long task the situation is different. Even if the task is rather complicated there is probably not that much code involved. Also data can be optimized for maximum performance (often data should be processed in long trains in order to get maximum speed). The cache doesn't need to be that advanced for this type of scenario.



Oh Assler you are soo full of life and... :heink: 

Word, Playa.
a b à CPUs
August 21, 2009 12:23:07 AM

TROLL!!!!!!!!!!

Just another Phenom guy trying to convince himself he made the right choice, when infact he didnt.
August 21, 2009 12:35:42 AM

daship said:
TROLL!!!!!!!!!!

Just another Phenom guy trying to convince himself he made the right choice, when infact he didnt.

could it be that you ar disipointed when someone explains how the cpu works and you will understand that phenom is a good game cpu? if you have spent a lot of money on intel
a b à CPUs
August 21, 2009 1:26:20 AM

i7 is the better choice for gamers, what upsets most of us amd guys is how that ammount is blown out of proportionits like what an average 5-10% increase in fps is negligible at best, i also hate when amd guys try to go to any means necessary to support their team. including makng up benches favoring our team, amd is a very good company and well i cant say intel is a good company but their products are good and the i7 is better live with it, and im not talking about clock for clock, 920 vs 965 at stock
August 21, 2009 6:38:13 AM

xaira said:
i7 is the better choice for gamers


when you are gaming, where do you need performance?
a c 126 à CPUs
August 21, 2009 7:06:06 AM

kassler said:
Crysis: complicated game, don't need that much cpu power but probably access a lot of memory because graphics is so detailed. In this game you could probably start to see that phenom performs better on the most heavy sections. The scenario would be that on very low resolutions, i7 is faster. increasing resolution will narrow the gap and when you start to get close to 100% gpu bottleneck phenom passes i7, increasing the resolution even more and the score is evened out because the game is 100% bottlenecked by the gpu.

Far Cry 2: If I remember right this game doesn't seem to be that scattered. Both i7 and C2Q seem to run this game well. Don't know how much cpu power the game needs on high resolutions and how much memory that is used in complicated situations or if it can handle a lot of enemies with advanced AI and physics.

Left 4 dead: game unknown for me.

Call of duty: Not that heavy game for the computer to run. if I am right it is almost single threaded?

In today's game you can run almost all games on a dual core processor and they fit within 2 GB ram I think. They all have one main render thread because DX 9 and 10 don't work well otherwise.

This will change with DX11 and games will probably add much more physics and AI, DX11 is also able to take advantage of more cores for rendering.
Maybe you will see quad CPU's running at 80-90 % on high resolutions calculating huge amounts of data to render the picture. Finding data in the cache will be more and more important in order to gain speed. This is why Phenom II has a better cache design for advanced games.

For applications that may render or do some other long task the situation is different. Even if the task is rather complicated there is probably not that much code involved. Also data can be optimized for maximum performance (often data should be processed in long trains in order to get maximum speed). The cache doesn't need to be that advanced for this type of scenario.



Left 4 Dead is a heavily multithreaded game. It can and will use 80% of all 4 cores on a quad core machine. Source (the engine it is based off of) is very CPU dependant.

Crysis sucks either way. CryEngine is just horribly optimized.

FC2 is a decent game (better than Crysis but still meh) and does benefit from more cores.

CoD 4+ does benefit from more cores as well but not as much as say FC2 or L4D.

As for physics, you must remember that most games use Havok, which I prefer, and not PhysX. Intel owns Havok. Which will mean they can easily optimize it for Intel CPUs.

just something to think about.

kassler said:
when you are gaming, where do you need performance?


Graphics, thats a no brainer. But the problem is that if you buy a crappy CPU (Pentium DC/Athlon X2 3800+) and pair it with a high end GPU then the CPU will severly limit the GPUs performance since at the low end spectrum of the FPS, the game relys on the CPU.

What you need is to sort of balance it. If you do a high end CF/SLI setup then a Core i7 will benefit you. If you do a single GPU, say a HD4800 or G200+ then even a C2Q Q6600 or decent Phenom II (I don't include Phenom I due to horrible clocks and performance) will be fine. But add more cores and a faster CPU that can push more raw data will be beneficial.

And as for having to access memory, it wont really matter since both Core i7 and Phenom II have super fast IMCs that can access memory almost as fast as cache. So it wont kill performance like with a older FSB setup.
August 21, 2009 7:30:57 AM

jimmysmitty said:
Graphics, thats a no brainer.

is handled by gpu...
a b à CPUs
August 21, 2009 7:39:26 AM

keithlm said:
Thanks for your post showing that the i7 needs to run at 3.8Ghz against a 3.64Ghz Phenom II to match the speeds.

This contradicts what is commonly believed.


You really don't get that the benchmarks that are identical are GPU bottlenecked, do you?

They perform equally because they are at a point where CPU performance is completely irrelevant to system performance (as long as it is fast enough to saturate the GPU(s)). Note that in no case does the Phenom noticeably outperform the i7, which is what would be expected if the system were still CPU bottlenecked but the Phenom were somehow better. In all cases where the CPU has any significant impact on performance, the i7 doesn't just win, it absolutely flattens the Phenom in every way.

Now, in most cases, there isn't enough GPU power for the i7 to truly distance itself, and for most GPU setups, the Phenom performs just fine. It's a great choice for the price, and is certainly not a bad CPU. However, if absolute maximum performance is your goal, and you can afford the extra cash, there is no situation in which a Phenom II will overtake an equivalently clocked i7. The i7 has a faster memory controller, faster single threaded computation, faster multithreaded computation, larger cache, and is superior in every way.

Don't take this as an unconditional ad for an i7 - I have recommended Phenom IIs to friends before, and I will continue to praise their value and their ability to bring AMD back into at least some level of competition with Intel. They really are excellent CPUs. However, they cannot claim the absolute performance crown, no matter what the AMD fanboys would like to think.
a b à CPUs
August 21, 2009 7:43:05 AM

kassler said:
can't edit messages...

Bugs are great aren't they? Fixed it for you.
August 21, 2009 7:46:10 AM

cjl said:
You really don't get that the benchmarks that are identical are GPU bottlenecked, do you?

Do you understand that it varies during gameplay?
There are a lot of factors that decide if the burden is mostly on the CPU or the GPU for each frame.
August 21, 2009 7:46:47 AM

randomizer said:
Bugs are great aren't they? Fixed it for you.

Thanks! :) 
a b à CPUs
August 21, 2009 7:48:30 AM

kassler said:
Do you understand that it varies during gameplay?
There are a lot of factors that decide if the burden is mostly on the CPU or the GPU for each frame.

Agreed. And all of the benchmarks show that when the burden is on the CPU, the i7 pulls ahead of the phenom II, without fail.
a b à CPUs
August 21, 2009 7:49:10 AM

Try not to turn this into a flame war please. Post data and let the data speak for itself.
August 21, 2009 7:58:16 AM

cjl said:
Agreed. And all of the benchmarks show that when the burden is on the CPU, the i7 pulls ahead of the phenom II, without fail.

Can you explain why phenom wins in some games just before the game is 100% bottlenecked by the gpu?
Did you understand the text about cache design? if you did understand that text, can't you draw any conclusions from that?
August 21, 2009 8:07:52 AM

Conclusion for this thread:

OP posted an article that he didn't fully understand, and simply wave around the "article" as if it supports his biased opinion.
a b à CPUs
August 21, 2009 8:21:10 AM

kassler said:
Can you explain why phenom wins in some games just before the game is 100% bottlenecked by the gpu?
Did you understand the text about cache design? if you did understand that text, can't you draw any conclusions from that?

Which benchmarks are you talking about? All the ones I see either show a total GPU bottleneck (a dead heat), or the i7 pulling ahead. None seem to show the Phenom II pulling ahead (and don't point to 1 or 2 percent, since that's well within normal variation).
a c 126 à CPUs
August 21, 2009 8:21:28 AM

kassler said:
is handled by gpu...


Unless you have a crappy CPU behind it then the GPUs power is wasted since it cannot push further than the CPU can pass on data to it.

hence why I said there needs to be a balance. Most current GPUs will not bottleneck a quad core pre Core i7. But multiple ones will.

kassler said:
Can you explain why phenom wins in some games just before the game is 100% bottlenecked by the gpu?
Did you understand the text about cache design? if you did understand that text, can't you draw any conclusions from that?


Cache design is great but you also forget that Intels cache design and prefecth are superior. Intels L3 acts as a buffer to save any data that has been in the L1/L2 cache to easily access and resuse faster than memory.

But because there are such superfast interconnects (QPI/HTT) having to access memory is not a burden. Most of the interconnects run at about 10GB/s (17GB/s tri channle DDR3) and will just get even faster. Being able to push that much data around makes it easy to utilize the memory.

*Edit*

You are just looking at L3. If L3 was a main deciding factor then C2Q would not be able to keep up with Core i7 or Phenom II in high res gaming but it does with one or two GPUs.

The biggest cache to worry about is the L2 cache since it will be utilized more often than the L3 cache will for games. The L3 for Intel is a sort of buffer as I said before so if the game needs to put through more data that it already had it wont have to look in memory or access the HDDs but rather grab it from the L3 cache which is on die. I am not too sure how Phenoms L3 works in comparison.
August 21, 2009 8:21:34 AM

2 things
Why does X3 run SO well on P2
And, we will soon know if i7 is truly better soon, with the new cards coming out, and the "bottlenecks" which do exist sometimes, where Ive seen others claim far too many, if i7 pulls away or not. Not saying t wont just sayin we'll at least have a better understanding after those cards come
August 21, 2009 8:22:00 AM

yomamafor1 said:
Conclusion for this thread:

OP posted an article that he didn't fully understand, and simply wave around the "article" as if it supports his biased opinion.

Why don't you explain then ;) 

I mean the information about how caches are designed
August 21, 2009 8:22:51 AM

And 3, why does X3 play so well on P2?
August 21, 2009 8:26:54 AM

jimmysmitty said:
Cache design is great but you also forget that Intels cache design and prefecth are superior. Intels L3 acts as a buffer to save any data that has been in the L1/L2 cache to easily access and resuse faster than memory.


You didn't explain why Phenom pulls ahead just before the game is 100% bottlnecked by the gpu in some games
a c 126 à CPUs
August 21, 2009 8:27:31 AM

JAYDEEJOHN said:
And 3, why does X3 play so well on P2?


You mean X3 Reunion? Who knows. Its an old game. I get 5000 FPS on Duke Nukem 3D on my Q6600......
a b à CPUs
August 21, 2009 8:29:33 AM

kassler said:
You didn't explain why Phenom pulls ahead just before the game is 100% bottlnecked by the gpu in some games

Before you can keep claiming this, you really should show some proof...

August 21, 2009 8:30:14 AM

cjl said:
Which benchmarks are you talking about?


you did post one graph where phenom performed better just before the game was 100% bottlenecked by the gpu
a c 126 à CPUs
August 21, 2009 8:32:32 AM

kassler said:
you did post one graph where phenom performed better just before the game was 100% bottlenecked by the gpu


I'm not seeing it. They are all within a fraction of a percent and in some before beyond 1900x1200 the Core i7 performs better.

but the only one where the Phenom II performs better it is by such a small percent its negligable.
August 21, 2009 8:39:23 AM

jimmysmitty said:
I'm not seeing it. They are all within a fraction of a percent and in some before beyond 1900x1200 the Core i7 performs better.

but the only one where the Phenom II performs better it is by such a small percent its negligable.


How is you math?

If you take 100 frames. 95 of those frames are bottlenecked by the gpu, 5 is bottlenecked by the cpu. do you think you will get big differences?
August 21, 2009 8:42:48 AM

No ones explained why P2 trashes anything Intel, period. This may answer if, in 1 game, this theory actually works .....or not heheh
August 21, 2009 8:45:39 AM

And I mean X3. Every time Ive seen it beched, it stomps i7 whatever, the AMD chips. Theres never been an answer, and benchers usually stay away from it.
Anybody here remember any of those benches, and if so, know why?
Its an odd man out, I realise, but it could give us all a better understanding of the differences between the 2 arches
August 21, 2009 8:50:05 AM

Ive noticed this, and kassler, if youre such a huge AMD supporter, you should really look into this, as it would support your argument
August 21, 2009 8:59:46 AM

JAYDEEJOHN said:
Ive noticed this, and kassler, if youre such a huge AMD supporter, you should really look into this, as it would support your argument

My english isn't that good, I am trying to understand what you are saying but don't exactly understand.

I am not a supporter of any CPU. What I like is advancements in hardware (I am a programmer) it is important that good solutions gets credits for that, software is almost allways behind the hardware so it isn't allways simple to understand what is best in the long run.

If I need much database performance and don't care about poweruse then I would by i7. i7 is a very good server processor and I think that is the main target for that cpu. i7 has two problems, poweruse and price there. There are a lot of servers that doesn't need performance
August 21, 2009 9:07:50 AM

yomamafor1 said:
yawn...another troll who posts something that he doesn't fully understand.



What's the matter? Are you incapable of exceeding a second-grade education?

You are the one that does not understand,

BTW... the shark is a symbol commonly used to keep people in a constant state of fear for the
purpose of control. Only Tyranny uses it.

August 21, 2009 9:12:01 AM

I agree, SW lags behind, and its a crapshoot regarding trends, but i7 shows its power/perf is fine for the most part, and usually its that lag of SW you mentioned where i7 is hurting in perf, tho, then it uses much less power as well
August 21, 2009 9:13:25 AM

To me, its a chicken or egg argument, one has to come first. Let the SW catch up, then we will know
a b à CPUs
August 21, 2009 9:19:05 AM

If you're talking about Crysis, 2560x1600, that's a difference of 0.83%. If you're talking about Left 4 Dead 2560x1600, that's a difference of 0.288%. In both cases, the difference is well within normal variation. Basically, the scores are the same to within the resolution of the test.
a b à CPUs
August 21, 2009 9:20:15 AM

kassler said:
How is you math?

If you take 100 frames. 95 of those frames are bottlenecked by the gpu, 5 is bottlenecked by the cpu. do you think you will get big differences?


Now you're speculating - the way to prove this is to show some sort of framelog or similar - demonstrate a clear difference in min. fps, or fps in certain scenarios and you might have a point. Currently however, you are just trying to find some possible way to make a PhII sound good, which is getting rather old honestly.
a b à CPUs
August 21, 2009 9:22:45 AM

kassler said:
My english isn't that good, I am trying to understand what you are saying but don't exactly understand.

I am not a supporter of any CPU. What I like is advancements in hardware (I am a programmer) it is important that good solutions gets credits for that, software is almost allways behind the hardware so it isn't allways simple to understand what is best in the long run.

If I need much database performance and don't care about poweruse then I would by i7. i7 is a very good server processor and I think that is the main target for that cpu. i7 has two problems, poweruse and price there. There are a lot of servers that doesn't need performance

Say what you will, but in all of your posts here, you certainly come across as someone with a very strong anti-Intel bias. I'm not saying that you are necessarily biased, but your posts have certainly not done anything to try to dispel that idea.
August 21, 2009 9:27:27 AM

cjl said:
Now you're speculating - the way to prove this is to show some sort of framelog or similar - demonstrate a clear difference in min. fps, or fps in certain scenarios and you might have a point. Currently however, you are just trying to find some possible way to make a PhII sound good, which is getting rather old honestly.

The problem is that you don't want to understand
August 21, 2009 9:28:06 AM

To me, P2 is like a gfx card with 1 gig of memory, it currently just havnt the power to completely use it in most cases.
They change a couple things, and yes, all that potential will be available
August 21, 2009 9:34:38 AM

How much latency is there before L3? Or, how much on each chip to get to L3?
Isnt that important?Pertinent?
August 21, 2009 9:37:17 AM

cjl said:
If you're talking about Crysis, 2560x1600, that's a difference of 0.83%.


I am talking about crysis 1920x1200
i7 @ 3.8 vs Phenom II @ 3.64 = 0.044 or 4.4%
29.42 fps vs 30.41 = 0.034 or 3.4%

4.4% + 3.4% = 7.8%

And remember that i7 L3 cache runs faster compared to Phenom II

This is NOT within normal variation. If you put up ten other computers and run same test you will notice same behaviour
August 21, 2009 9:47:04 AM

At least now youre specific.
Then to really get down to it, wed need as many examples as we can find to make this a reasonable assumption, or fact, right? You have more benchmarks? That shows this disparity, even at other frequencies?
!