Phenom 9700, AMD's 1st Quad-Core CPU

Technology I - Advanced Memory Prefetcher, SSE4a

As our avid readers will undoubtedly remember, Intel introduced the first SIMD extensions to the X86 ISA in the shape of the MMX instruction set. As a countermove, AMD implemented the 3DNow feature in its own processors. This resulted in a situation where software did not benefit from the same kind of performance boost on both companies' processors, since it had to be specially optimized to take advantage of the extensions. Thankfully, this kind of competition and incompatibility died down, and the SSE, SSE2 and SSE3 extensions used by AMD and Intel were identical. However, the two chipmakers are now parting ways once again, to the detriment of the users and the programmers. With the launch of the Penryn core, Intel introduced the SSE4.1 instruction set. AMD, meanwhile, is implementing SSE4a (formerly known as SSE128) in the new Stars Core micro architecture.

The Phenom's SSE unit is being widened to 128 bits , up from the Athlon 64's 64 bit unit. Additionally, AMD is adding four new instructions , namely EXTRQ/INSERTQ and MOVNTSD/MOVNTSS. Two more instructions, LZCNT/POPCNT, which are primarily used for load operations and bit manipulations functions, are included as well.

Sadly, Intel's SSE4.1 and AMD's SSE4a are incompatible with one another - a fact that may soon cause problems for programmers and users alike.

The advanced memory prefetcher can load data directly from the RAM to the core's L1 cache without needing to take a detour through the L2 cache first. Thus, the data can be loaded into the processor with a much lower latency. Simultaneously, this also results in a lower load on the L2 cache, which can instead buffer data more efficiently, in turn translating into an overall performance boost.

Furthermore, the prefetcher identifies recurring data patterns and can pre-fetch them even before they are requested.

x86 instructions are between 3 and 15 bytes long. Compared to the Athlon 64 core, the data buffer for fetching instructions was increased to 32 bytes, allowing the core to process more instructions simultaneously. Thus, as you can see in our diagram, up to three instructions can be processed at the same time, depending on the length of the instructions.

Tom's Hardware News Team

Tom's Hardware's dedicated news crew consists of both freelancers and staff with decades of experience reporting on the latest developments in CPUs, GPUs, super computing, Raspberry Pis and more.

  • spearhead
    good review but you should have had included more result of the overclocked phenom. i just want to know how much juce i you can push out of it for me it is a must it beats the 6400+ otherwise its not worth purchasing in my opinion, it just has to beat its older generation when its running at same clocks.that is why amd has to work on its clock speed and cache. hopefully deneb will be out soon. i would also realy appriciate it to see some review about the phenom 9850 black edition compared against both the 6000+ 6400+ and q6600 and q9300 and maybe some e8xxx model. with overclocked results. pushed it to the maximum. would be realy cool hehe :)
  • haifen
    The SB700 does indeed support at least one PATA port as my motherboard has an IDE connector and I can use it with the ATIIXP PATA driver.