Tom's Hardware > Forum > CPU & Components > CPUs > Montecito number and feature

Montecito number and feature

Forum CPU & Components : CPUs - Montecito number and feature

Tom's Hardware: Over 1.4 million members in 6 different countries available to answer all your high-tech questions. Sign up now! Its free!
Word :    Username :           
 

The 2005 ISSCC advance program1 offers quite a few interesting details about Intels upcoming dual-core Itanium processor, Montecito. Here's a breakdown of the papers and the information which has been disclosed:


--------------------------------------------------------------------------------


The Implementation of a 2-core Multi-Threaded Itanium Family Processor

"The next generation in the Itanium processor family, code named Montecito, is introduced. Implemented in a 90nm 7M process, the processor has two dual-threaded cores integrated with 26.5MB of cache. Of the total of 1.72B transistors, 64M are dedicated to logic and the rest to cache. With both cores operating at full speed, the chip consumes 100W."

Clock Distribution on a Dual-core Multi-threaded Itanium-Family Processor

"Clock distribution on the 90nm Itanium processor is detailed. A region-based active de-skew system reduces the PVT sources of skew across the entire die during normal operation. Clock vernier devices inserted at each local clock buffer allow up to a 10% clock-cycle adjustment via firmware or scan. The system supports a constantly varying frequency and consumes < 25W from PLL to latch while providing < 10ps of skew across PVT."

A 90nm Variable-Frequency Clock System for a Power-Managed Itanium-Family Processor

"A clock-generation system delivers fixed- and variable-frequency clocks for adaptive power control on a 1.7B-transistor dual-core CPU. Frequency synthesizers digitally divide a fixed-frequency PLL clock in 1/64th cycle steps using programmable voltage-frequency-converter loops. 1-cycle loop response tracks supply transients with adaptive modulation, improving CPU performance by over 10% compared to a fixed-frequency design."

Power and Temperature Control on a 90nm Itanium-Family Processor

"This paper describes the embedded feedback and control system on a 90nm Itanium-family processor, code-named Montecito, that maximizes performance while staying within a target power and temperature (PT) envelope. This system utilizes on-chip sensors and an embedded micro-controller to measure PT and modulate voltage and frequency to meet PT constraints."

The Multi-threaded Parity-Protected 128-Word Register Files on a Dual-Core Itanium-Family Processor

"The dual-thread 18-port 128w x 82b FPU register file, and the 22-port 128w x 65b integer register file of the microprocessor is described. Parity embedded into each register provides soft error detection. The design integrates a charge-compensated thread switch and power-saving features to operate at 1.1V consuming 400mW at maximum frequency."


The Asynchronous 24MB On-Chip Level-3 Cache for a Dual-Core Itanium-Family Processor

"The 24MB level-3 cache on a dual-core Itanium processor has more than 1.47G transistors. The cache uses an asynchronous design to reduce latency and power, and it includes other power saving and reliability improvement features. The 5-cycle array operates above 2GHz at 0.8V and 85°C while consuming less than 4.2W."



--------------------------------------------------------------------------------


Issues: If Montecito has 64M logic transistors, this means Intel's spending 32M transistors per core, or about 60% more than Itanium 2 if you include the x86 unit2, and a whopping 130% more if you exclude it. Given that it's L1 caches have not increased in size3, it's clear that the core is increasing in complexity, contrary to what some have predicted in the past. Whatever gain in electrical and logical complexity that was achieved by eliminated the x86 unit has been lost elsewhere. The process complexity of Montecito is certainly higher, as the number of metal layers used as increased from 6 to 7.

The first paper also mentions that the "chip consumes 100W" "with both cores operating at full speed." This seems rather awkward, as the predecessors to Itanium operated in a 130W power envelope. If Montecito only uses 100W at "full speed," then it's maximum frequency is limited by the circuit delays and not power. If the core is operating at 1.1V, it would seem wise to increase the core to 1.2V in order to increase FET drain currents and increase circuit performance, as this would only increase dynamic power by 20%. Theoretically, it's possible that any higher potential than 1.1V would decrease reliability of the circuit devices to unacceptable levels, but it seems like an implausible scenario. The only simple conclusion is that the abstract only refers to a particular test chip and not the entire line of Montecito chips which will be produced and sold. That, or the 100W power limit is not including the additional power budget for chip operating using "Foxton technology" (more below).

The second paper states that the clock skew across the entire die is 10ps. This is worse performance than Madison, which achieves a 7ps cross-die skew using a combination of a fuse-based clock deskew circuit and a scan-based skew control mechanism4. At the same time, power has been reduced to less than 25W, or less than 25% of the total power budget (assuming a 100W max). In contrast, Madison's power budget was 30% of total power5 (39W, assuming 130W Pmax).

The dynamic clock adjustment ("Foxton" ) range is outlined in the third and fourth papers. The closed-loop control system has a single-cycle response time, and it appears that it can only vary the 'peak' clock speed by ~10% of the 'normal,' maximum core frequency.

The abstract for the fifth paper states the porting on the floating point and general purpose register files. It's interesting that the abstract explicitly states that the FP RF is "dual-thread," whereas it mentions nothing of the sort for the GP RF. Given that Montecito is to have course-grained, non-simultaneous multithreading, what possible meaning could "dual thread" have in the context of a register file? Whether this detail is intentional or merely poor writing is unclear, though. This paper also informs us that the core logic is operating at 1.1V.

The last paper provides general information about a probable max frequency for the product line on the current 90nm process - 2.0GHz. This frequency seems likely given the "technical" performance levels that Intel has claimed for Montecito at the September 2004 IDF6; Intel has stated that Montecito will achieve a performance gain of 1.9x relative to McKinley on "technical" workloads. Given that many of the architectural improvements in Montecito aren't relevant to high-performance, technical computing, Montecito must rely on frequency and main memory bandwidth gains in order to perform better in this type of computation. Thus, a relative performance increase of 1.9 lends itself to the argument that Montecito will clock at ~1.9x of McKinley, or ~1.9GHz.

The final paper abstract also provides details about the L3 cache scheme. 24MB of SRAM requires 1.208B transistors, and this means the L3 cache in Montecito has an overhead of [(1.47 - 1.208)/1.208]*100 ~= 21.7%. In other words, 21.7% of the transistors in the L3 cache are dedicated towards ECC, tags, and redundancy. Previous estimates of transistor counts in the L3 of about 1.6B7 are off by, roughly, 9%


i need to change useur name.

Sponsored Links
Register or log in to remove.
Tom's Hardware > Forum > CPU & Components > CPUs > Montecito number and feature
Go to:

There are 1024 identified and unidentified users. To see the list of identified users, Click here.

Please mind

You are about to answer a thread that has been inactive for more than 6 months.
If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.

Add a reply Cancel
Sponsored links
  • Ask the community now
  • Publish
Ad
They won a badge
Join us in greeting them