IDF Spring 2006: Will Intel's Core Architecture Close the Technology Gap?

Smart Memory Access

Advanced Prefetch

After developing a clearly more efficient processing architecture and a powerful L2 cache, Intel wanted to make sure that these units get used as efficiently as possible. Each Core dual-core processor comes with a total of eight prefetcher units: two data and one instruction prefetcher per core and two prefetchers as part of the shared L2 cache. Intel says they can be fine-tuned for each of the Core processor models (Merom/Conroe/Woodcrest) in order to prefetch data differently, whether it is for mobile-, desktop- or server-class usage models.

A prefetcher gets data into a higher level unit using very speculative algorithms. It is designed to provide data that is very likely to be requested soon, which can reduce latency and increase efficiency. The memory prefetchers constantly have a look at memory access patterns, trying to predict if there is something they could move into the L2 cache - just in case that data could be requested next. At the same time, prefetchers are highly tuned to watch for demand traffic, which can be a sequential data flow. In this case, prefetched caching would not make much sense.