What is Prefetching and Branch prediction?

G

Guest

Guest
I saw that these are going to be the new features on the Palomino, along with a chip that isn't going to torch your fingers if you bump it.

What exactly are they and are they going to help the chips performance at all?

If I can get an AXIA running at 1466 now, should I go that route or wait 3 months for a Palomino 1533?

Excellence isn't acheived overnight!
 

ledzepp98

Distinguished
Dec 31, 2007
223
0
18,680
branch prediction, and i think prefetching, has to do with cache hits. basically, the cache is a small amount of very fast memory located close to the cpu. since is is fast (much faster than looking in system memory), the cpu first looks in the cache for the next instruction it must operate. the branch prdiction unit is what tells the cpu where it should look to find the information (rather, it "predicts" it). a good branch prediction unit will result in a high chache hit percentage, which will help performance.

disclaimer:
this is a bit beyond my technical knowlegde so if i am wrong, please jump in and correct me...
 

kurokaze

Distinguished
Mar 12, 2001
421
0
18,780
you're very close.

Branch prediction has to deal with how accurately the CPU can "predict" which branch to take in a program. For example, if you have an "if-else" statement, the success of a branch prediction unit determines which path to take, the "if" or the "else" path BEFORE the program counter (EAX I believe) moves past that line of code.

Pre-fetch means taking additional information from memory into the cache. Usually, it might be the next couple of adjacent pages in memory. This is based on the assumption or belief that these pages will be required soon and so they are placed in the cache to faster access.

Hope that helps!
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
Brew yourself a big pot of coffee...

"Prefetching" means getting data from main memory before it's actually needed and storing it in cache memory. Every x86 CPU since the 386 (and maybe before that too) has a bit of circuitry called a "prefetch unit" that runs in the background, scanning both the CPU's internal registers and any cached instructions to determine what the execution unit (the part that actually runs instructions) is likely to need from memory next. It examines the CS and IP registers (Code Segment/Instruction Pointer) to figure out what snippets of code the exec. unit might want to execute next and attempts to get those snippets into fast cache memory ahead of time. It also looks at prefetched snippets of code to see which are going to require data from memory and tries to get that data into cache memory as well.

As for branch prediction...

A CPU tends to execute instructions one after another; it goes through a section of memory in sequence, picking out instructions as it goes along. Every once in a while, though, it has to choose between two paths of execution. In order to go down an alternate path of execution, it has to stop picking out instructions in sequence and start picking them from a completely different location in memory. This is "branching."

This causes a problem for a prefetch unit that's trying to get upcoming instructions into cache memory. If a branch occurs right in the middle of the instructions it currently has cached, then a lot of the instruction caching it's done is wasted effort. Not only that, but the exec. unit is suddenly demanding instructions that aren't in the cache. Suddenly, the exec. unit can run no faster than the memory's paltry 400MHz (or less).

The way a prefetch unit works around this is by taking an educated guess as to whether an instruction sequence will branch. Based on its guess, it will try to cache the instructions it thinks are coming next. It doesn't always guess right, but it guesses correctly often enough to avoid a lot of performance hits. This is "branch prediction."

Even in the best of times, a prefetch unit can't always keep up with the exec. unit; when it doesn't get instructions or data into the cache fast enough for the exec. unit, the exec. unit has to sit and wait. This is a "cache miss."

Basically, the faster a prefetch unit is, and the better its branch prediction is, the better it can keep up with the demands of the exec. unit. Having a good prefetch unit means that slow or high-latency memory doesn't hurt overall performance as much; it also means that the CPU itself is better able to saturate its memory bandwidth. This means that clock-for-clock, the a CPU with a better prefetch unit will probably get more benefit out of DDR memory.

As for what the PR bunnies are calling a "hardware prefetch"...all I can make of that comment is that the prefetch unit will actually be fully hard-wired.

Something that a lot of people don't know about x86 processors is that they're actually not fully "hard-wired;" x86 instructions are not executed directly in the core but are instead broken down into smaller instructions called <i>micro-ops.</i> Essentially there's a little bit of software (or rather firmware) inside of the CPU itself that handles this translation. This isn't an ideal situation for performance, of course; the translation from instructions to micro-ops incurs some latency in the exec. unit. But with an instruction set as complex as the x86 instruction set, this is the only practical way to handle things.

The prefetch unit's job is relatively simple, though, so I suppose it's possible to make it completely hard-wired. Maybe this is what AMD is doing with the Palomino?

As for whether to wait or not...well, really cool things are always just around the corner. The Palomino sounds like it's going to be a lot more than just a MHz increase; I'd say it ranks right up there with the transition from Slot A's to T-birds, it just won't require you to change your mobo.

I got impatient and ordered an AXIA T-bird with some DDR memory...I figure as cheap as good CPUs are right now, I might as well splurge a bit :wink:

Kelledin
<A HREF="http://kelledin.tripod.com/scovsms.jpg" target="_new">http://kelledin.tripod.com/scovsms.jpg</A>
 

ajmcgarry

Distinguished
Dec 31, 2007
379
0
18,780
I thought the Micro Ops only first appeared in the PII. It was a step to produce a hybrid processor. Neither a CISC or RISC chip that had the speed benefit of a RISC instruction set (although you could never actually program in it) and the complexity of a CISC instruction set interface. For each CISC instruction a corresponding set of "Micro Ops" existed, nothing to do with a compilation process but rather a straight conversion.
It would have been nice if CISC processors could be dumped altogether in favour of RISC, but what happen to all your old programs when you do that.

Question. Do you think that in the future with the rate of speed increases in processors, that CISC and RISC really matter?

<font color=red>Why don't you ever see the headline "Psychic Wins Lottery"?</font color=red>
 

Pettytheft

Distinguished
Mar 5, 2001
1,667
0
19,780
Thanks for the info!! I've always wanted to know exactly what some of that stuff did. After reading it 3 times I think I understand.....I think.



<i>If you take a truth and follow it blindly, it will become a Falsehood and you a Fanatic.</i>
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
The x86 chips were never fully hard-wired. Unless a chip <i>is</i> hard-wired, the micro-ops are going on at some level--not always visible, though, and not always possible to optimize for. The x86 CPU was moving towards RISC by the time the 486 was out; the simple, common instructions were getting handled by a RISC-like core and usually completed in one or two clock cycles. The Pentium classic went so far as to have rudimentary pipelining. I remember first seeing Intel's micro-op tables in the PPro manuals...

As for CISC vs. RISC, I still see Alphas, the ultimate RISC chip, whipping P4's and T-birds clocked to twice the MHz. I've heard similar claims of instruction-per-clock performance from the Mac crowd, though I've never confirmed it for myself. I think the whole debate would matter a great deal if Compaq kept the R&D in gear on the Alpha chips...

Kelledin
<A HREF="http://kelledin.tripod.com/scovsms.jpg" target="_new">http://kelledin.tripod.com/scovsms.jpg</A>
 
G

Guest

Guest
Micro-code has existed since the days of the 386. Intel used it, but the Cyrix 386 did not, which made it much faster. However, at today's high speeds, it would be impossible to build a chip without it.

~ I'm not AMD biased, I just think their chips are better. ~