12 MB L2 Cache And SSE 4
At the same clock speed, the new Penryn is faster than its 65 nm predecessor Conroe. The two most important innovations, which are directly responsible for this performance boost, are an L2 cache that has grown from 2 x 4 MB to 2 x 6 MB, and the introduction of the new SSE4 instruction set.
As a result of the larger L2 cache, several applications can reduce the number of times they need to access the comparatively slow RAM, allowing them to experience a performance boost of up to 27%.
The SSE4 instruction set comprises 54 new instructions, most of which were developed to speed up video editing tasks. However, Intel has not implemented the full instruction set yet - only 47 of its instructions. That is why this SSE extension is also called SSE4.1 (version 1). The second version, which will contain the entire instruction set, will be implemented in Penryn's successor Nehalem, and will be called SSE4.2.
This diagram shows a short history of the SSE extensions, from their first generation until today.
How Does SSE4 Accelerate Video Rendering?
Until now, a block of code dealing with motion detection looked like this:
Now, with the help of SSE4, that entire block can be replaced completely by the following instruction:
MPSADBW xmm0, xmmm1, 0
While this also saves time during the programming phase, it also significantly accelerates the execution of the program.
When encoding videos, the Codec is not the only important factor (e.g. DivX 6.6.1 or later), but also the encoding program that is used. For example, the current version of VirtualDub already supports SSE4, while TMpegEnc and Adobe Premiere will get an update in November and by the end of the year, respectively.
We were able to test a pre-release version of TMpegEnc with SSE4 support. These were our results:
Intel gives a live demonstration of the SSE4.1 performance in Adobe Premiere CS3.
Intel wants to complete its transition to exclusive 45 nm production very quickly.