Sign in with
Sign up | Sign in
Your question

MaximumPC build a Nehalem

Last response: in CPUs
Share
a c 126 à CPUs
August 7, 2008 1:19:56 PM

I would think the reason for the bigger chip being the huge L3 cache if anything.

And plus bigger IHS area means bigger heat dissipation area which means aftermarket HSFs will probably be able to cool this chip more effectively.
Related resources
August 7, 2008 1:29:11 PM

The bigger chip is to do with the on-board memory controller. More pins are needed for the memory management module.
August 7, 2008 1:30:28 PM

Wow..



Dat is quite a difference in size.

Kinda wonder if Intel fed it too much silicon... :oops: 
a c 126 à CPUs
August 7, 2008 1:36:38 PM

^I think it got high in Hafnium :D 
August 7, 2008 1:41:08 PM

jimmysmitty said:
I would think the reason for the bigger chip being the huge L3 cache if anything.

And plus bigger IHS area means bigger heat dissipation area which means aftermarket HSFs will probably be able to cool this chip more effectively.


If memory servers me right Nehalem comes with 8mb L3 while Yorkfield came with 2x6mb. Based on that i would speculate that the die size of a quad core Nehalem, even with Quickpatch and a new HT, is actually smaller than Yorkfield.
As pointed out by Jdocs the the packaging is simply larger to accomodate the larger number of pins.
August 7, 2008 1:44:13 PM



hmmmmm... lil bigger, but ya more pins, but more space taken for 4 cores.
a c 126 à CPUs
August 7, 2008 1:53:05 PM

Slobogob said:
If memory servers me right Nehalem comes with 8mb L3 while Yorkfield came with 2x6mb. Based on that i would speculate that the die size of a quad core Nehalem, even with Quickpatch and a new HT, is actually smaller than Yorkfield.
As pointed out by Jdocs the the packaging is simply larger to accomodate the larger number of pins.


Yea I forgot about the IMC. And the fact that a triple channel one will take more pins hence why there is Bloomfield with LGA1336 (3x memory controller) and Lynnfield with the other socket (2x memory controller).

But Penryn did not have L3 it had L2. Nehalem has L2 and L3 so that adds some more to size. Although Nehalem will come at just a few million less than Penryn in transistor count so.....
August 7, 2008 2:15:55 PM

Awesome article :) 

I will wait for the full benches to come out before I make up my mind about Neh though. Although it looks like a monster.
August 7, 2008 2:44:38 PM

jimmysmitty said:

But Penryn did not have L3 it had L2. Nehalem has L2 and L3 so that adds some more to size. Although Nehalem will come at just a few million less than Penryn in transistor count so.....

Indeed, but the difference between the L2 and L3 is architectural not technical. It's the same kind of memory intel is kind of famous for. I actually thin it is a good thing intel is "reducing" cache and increasing the logic part again. That's just my opinion though.
August 7, 2008 2:55:46 PM


Intel puts on chache because they have the manufacturing capabilites to do so. Cache is a simple solution to increase performance to a certain degree. At a certain point more cache won't increase performance anymore (at least not noticably). AMD for example doesn't do much cache because their production capabilites are limited. If they had more fabs you would see AMDs use more cache too. Actually the reduction of the cache in their X2 lineup was due to 1. transition to 65nm and 2. to increase the number of processors they can produce to meet demand. The samller a processor gets, the more they can produce on a single wafer in a single fab. That's why the 45nm Phenoms will have 6mb L3.
Another benefit of large caches is that defective cells can be deactivated and the chip still works.
With the new Quickpath insane caches should be less important too since the access to the memory should be faster and even small caches should work more efficient.
I'm curious about the cache size intel will use on their 6 core Nehalems.


That's what intel said about their P4. It will be the uber-chip once it reaches 10Ghz in 2006...
a c 99 à CPUs
August 7, 2008 3:14:07 PM

radnor said:
Kewl !! I hope history repeats it self.
Intel Willamette Socket 423 where HUUUUUUGE !!! and sucked.

Oh well.


Sockets used to be quite a bit larger than they are today before the CPU makers went from standard PGA to micro-PGA. Socket 370 and socket A chips are 2" by 2" while Socket 478/603/604 and socket 754/940/939/AM2/S1 are about 1.25" by 1.25". AMD's Socket F/LGA1207 processors may be just a touch bigger but I've never personally installed one, so I can't tell you. That LGA1366 chip in the picture looks to be roughly the size of a Socket 7 processor, which would still make it a little smaller than 370/A or 423 IIRC.

@Garmin: The 45 nm AMD quad-core "Deneb" have 4x128 KB L1, 4x512 KB L2, and 6 MB L3 cache, giving them 8.5 MB of total cache size. Nehalem's 4x64 KB L1, 4x256 KB L2, and 8 MB L3 is 9.25 MB, which is only 768 KB more cache than what an AMD quad made on the same manufacturing node possesses. Perhaps the chip's performance would suffer without so much cache, perhaps not. We have seen some preliminary benches of AMD's 45 nm Deneb Phenoms with 8.5 MB cache vs. the 65 nm Agena Phenoms with 4.5 MB cache and the Denebs are at best 10% faster clock-for-clock in the benchmarks run thus far. I'd be willing to bet that Intel's Nehalem won't be too much different due to a rather similar design.

@jimmysmitty: The package may have been larger for any number of reasons. I doubt that it is bigger just to accommodate more contacts as AMD makes Socket F Opterons that are roughly the same size as LGA775 and AM2 processors yet have 1207 contacts on the bottom. Intel could have done the same and ended up with a package that is not much larger than LGA775, but they chose to have the same kind of no-land section directly beneath the chip as they have done with LGA775 rather than completely covering the bottom of the package with lands as in Socket F. Perhaps using such an arrangement makes it easier to assemble the CPU to the package, but Intel most certainly could have gotten away with a smaller package if they really wanted to. Just look at their chipsets- they have 1200-1400+ contacts on the bottom but the package isn't much larger than that of a CPU. Granted, the chipsets are all BGA and BGA is the most efficient way to cram the largest number of contacts in the smallest space, but LGA isn't that far off in contact density. I have a hunch that Intel also wanted a bigger IHS and the ability to easily use larger heatsinks with larger, quieter fans and not have an overly-tall heatsink. There is no better way to enforce a keep-out zone for a larger heatsink than to have a bigger socket occupying that region.
August 7, 2008 3:30:48 PM

MU_Engineer said:

The package may have been larger for any number of reasons. I doubt that it is bigger just to accommodate more contacts as AMD makes Socket F Opterons that are roughly the same size as LGA775 and AM2 processors yet have 1207 contacts on the bottom. Intel could have done the same and ended up with a package that is not much larger than LGA775, but they chose to have the same kind of no-land section directly beneath the chip as they have done with LGA775 rather than completely covering the bottom of the package with lands as in Socket F. Perhaps using such an arrangement makes it easier to assemble the CPU to the package, but Intel most certainly could have gotten away with a smaller package if they really wanted to. Just look at their chipsets- they have 1200-1400+ contacts on the bottom but the package isn't much larger than that of a CPU. Granted, the chipsets are all BGA and BGA is the most efficient way to cram the largest number of contacts in the smallest space, but LGA isn't that far off in contact density. I have a hunch that Intel also wanted a bigger IHS and the ability to easily use larger heatsinks with larger, quieter fans and not have an overly-tall heatsink. There is no better way to enforce a keep-out zone for a larger heatsink than to have a bigger socket occupying that region.


Maybe they had larger, future processors in mind when they designed it. I imagine that the die will grow quite a bit before intel transistions to 32nm. Maybe they planned to put some GPU features on it too, which would take some room. I guess we will see.
a c 126 à CPUs
August 7, 2008 4:42:40 PM

They use that much cache because its esentially like having 2MB L3 cache per core with the ability for each core to use all 8MB. The reason why they use this much cache is because if you can keep the process on the CPU it will be completed faster instead of having to go the system memory to grab the required items to complete the process. Where this helps it more with a FSB over a IMC its still good practice since it looks like Intel has gotten their cache latencies pretty low.

From what I have seen Phenom @ 3GHz still has the same performance.

Barcy is different as that is from the server market and I am sure that at 3GHz it should perform much better.
August 7, 2008 11:22:54 PM

jimmysmitty said:

From what I have seen Phenom @ 3GHz still has the same performance.

Barcy is different as that is from the server market and I am sure that at 3GHz it should perform much better.


Why?
August 7, 2008 11:43:37 PM

who care how big it is, it means more CPU cooler for your money

i bet it so that the socket has room for 16 cores.

think how long LGA775 as lasted, intel doesnt like it change sockets. (or at least not recently) and with a bit of luck this one should last a while whihc is a good thing in my mind.
August 7, 2008 11:46:42 PM

MU_Engineer said:
Sockets used to be quite a bit larger than they are today before the CPU makers went from standard PGA to micro-PGA. Socket 370 and socket A chips are 2" by 2" while Socket 478/603/604 and socket 754/940/939/AM2/S1 are about 1.25" by 1.25". AMD's Socket F/LGA1207 processors may be just a touch bigger but I've never personally installed one, so I can't tell you. That LGA1366 chip in the picture looks to be roughly the size of a Socket 7 processor, which would still make it a little smaller than 370/A or 423 IIRC.

@Garmin: The 45 nm AMD quad-core "Deneb" have 4x128 KB L1, 4x512 KB L2, and 6 MB L3 cache, giving them 8.5 MB of total cache size. Nehalem's 4x64 KB L1, 4x256 KB L2, and 8 MB L3 is 9.25 MB, which is only 768 KB more cache than what an AMD quad made on the same manufacturing node possesses. Perhaps the chip's performance would suffer without so much cache, perhaps not. We have seen some preliminary benches of AMD's 45 nm Deneb Phenoms with 8.5 MB cache vs. the 65 nm Agena Phenoms with 4.5 MB cache and the Denebs are at best 10% faster clock-for-clock in the benchmarks run thus far. I'd be willing to bet that Intel's Nehalem won't be too much different due to a rather similar design.

@jimmysmitty: The package may have been larger for any number of reasons. I doubt that it is bigger just to accommodate more contacts as AMD makes Socket F Opterons that are roughly the same size as LGA775 and AM2 processors yet have 1207 contacts on the bottom. Intel could have done the same and ended up with a package that is not much larger than LGA775, but they chose to have the same kind of no-land section directly beneath the chip as they have done with LGA775 rather than completely covering the bottom of the package with lands as in Socket F. Perhaps using such an arrangement makes it easier to assemble the CPU to the package, but Intel most certainly could have gotten away with a smaller package if they really wanted to. Just look at their chipsets- they have 1200-1400+ contacts on the bottom but the package isn't much larger than that of a CPU. Granted, the chipsets are all BGA and BGA is the most efficient way to cram the largest number of contacts in the smallest space, but LGA isn't that far off in contact density. I have a hunch that Intel also wanted a bigger IHS and the ability to easily use larger heatsinks with larger, quieter fans and not have an overly-tall heatsink. There is no better way to enforce a keep-out zone for a larger heatsink than to have a bigger socket occupying that region.



Add to that the very nasty rumour (from 'reliable' sources) that both manufacturers have been seriously considering going back to a slot mount to deal with the eventual shift to +4 core CPUS and the increased size requirements those will create, and both Intels new socket and AMD much bragged about AM3 will both be worthless in the not too distant future.
August 7, 2008 11:55:06 PM

Slobogob said:
Maybe they had larger, future processors in mind when they designed it. I imagine that the die will grow quite a bit before intel transistions to 32nm. Maybe they planned to put some GPU features on it too, which would take some room. I guess we will see.


They are supposed to be releasing an 8-core Nehalem, so they may have gauged the 4 core size off that.
August 8, 2008 12:06:48 AM

turpit said:
Add to that the very nasty rumour (from 'reliable' sources) that both manufacturers have been seriously considering going back to a slot mount to deal with the eventual shift to +4 core CPUS and the increased size requirements those will create, and both Intels new socket and AMD much bragged about AM3 will both be worthless in the not too distant future.



That would be interesting to see. I wonder how they could do it without increasing the latency.





Do you think I'll be able to put a Pentium II into a Nehalem slot? :pt1cable: 
a c 126 à CPUs
August 8, 2008 12:16:09 AM

^LMAO!!!!!!!!!!

I always though of Nintendo and SNES when I saw the old slot style Pentium IIs and early slot style Pentium IIIs.

I would doubt they would go back though. Unless it magically can give the same if not better performance, a slot style would not be the most prefered method.
August 8, 2008 12:19:47 AM

Its either that, go to multiple sockets or make the mobos bigger to support gargantuan 6~8 in square sockets. The litographic nodes are rapidly approaching the point of 'no-more-shrink-shrink' and we're not going to see the prophesized 10GHz P4s/huge increase in clock speed; the only way to go is increase cores which are going to take up more real-estate

The slot gives the CPUs room to grow,(literally) without having to change an onboard socket, as well as the possibilities of mounting other system CPU specifc hardware, plus alligning the all the slots for improved airflow/cooling...latency is really not too much of an issue and ultimately, its a good solution for both the CPU and motherboard manufacturers albeit frustrating for the DIYer consumer....
a c 99 à CPUs
August 8, 2008 3:22:14 AM

turpit said:
Add to that the very nasty rumour (from 'reliable' sources) that both manufacturers have been seriously considering going back to a slot mount to deal with the eventual shift to +4 core CPUS and the increased size requirements those will create, and both Intels new socket and AMD much bragged about AM3 will both be worthless in the not too distant future.


Really? That seems unusual. The reason we moved away from slots and cartridges was that we needed more contacts, better ventilation, and the ability to mount bigger and heavier heatsinks. You have a minimum size for slot edge connectors (about 2 mm per connection) that limits the number of connections you can have before you get hugely long slots. You also have the issue of the heatsink hanging perpendicularly to the board, putting a lot of mechanical stress on the cartridge and slot connector.

Of course those can be worked around, but we'd end up with something that's more like a high-end multi-slot GPU rather than an old SC242 Slot 1/Slot A cartridge. The number of connections from the CPU to the motherboard would have to be minimized to make it work. The big users of pins on the CPU are power/ground and the memory controller. You would have the VRMs on the daughter card and just use cables and connectors to feed +12VDC through PCIe-type connectors, eliminating hundreds of connections to the MB. The RAM could be on the opposite surface of the card to the CPU, minimizing trace length and also minimizing the number of contacts to the rest of the motherboard by 480-960, depending on the number of channels. The edge connector would pretty much just have to handle CPU -> chipset connection through HyperTransport or CSI, which does not take very many data lines. IIRC HT takes about forty lines per link, which makes a quad-CPU-capable unit with four links take up 160 lines...small potatoes. The card would have multiple screw-to-the-case brackets to stabilize the card's weight and minimize the strain on the card and connector. So it's definitely doable, but the exact implementation would be interesting to see.
August 8, 2008 4:00:32 AM

The mechanical problems are simplistic to solve....always have been. The old slots were antiquated when they implemented them in the first place, so going off those designs is not a route to success. The geometry is easy enough as well, but the cost.....there's the battle. No doubt, which ever route is cheapest/cost effective for them is the one they'll take...regardless of the 'face' they lose in the process.

I just find it amusing that with all the hype (both AMD and fanboy) over AM2/AM2+/AM3 that AMD would even consider such a thing, but then if they survive and their plans see fruition, AM3 will have to go away soon enough...regardless of how they choose to implement the interface.
August 8, 2008 10:04:01 AM

turpit said:
Add to that the very nasty rumour (from 'reliable' sources) that both manufacturers have been seriously considering going back to a slot mount to deal with the eventual shift to +4 core CPUS and the increased size requirements those will create, and both Intels new socket and AMD much bragged about AM3 will both be worthless in the not too distant future.


Let us hope that is not the case. In Nehalem case hes got Hyper-Threading pushing the number of logical Cores. While i believe (and it is fairly obvious) that Dual-core and Quad-Core stirred the Industry and bumped performance a bit, going to a Octo-core (or many core in that matter) would reveal itself quite a disaster. Dont forget a Slot cpu will increase BOM, transportation costs, will increase manufacturing times......well, if you think quite hard of it, it will be quite a disaster. That is not a path they will choose for mainstream. Neither for blade servers if you think of it. You cant crunch a 4xSlot CPU on 1U or 2U. That would be funny.

I can only recall a chipset that managed SMP perfectly, and it was a old dual NB (s370) from ServerWorks. Parallel programming isn't quite that easy to achieve in many-core architecture. I don't know if anybody here have a background in programming, but in developing an application you always count the developing time. That is what defines TCO of a software.
From What i have "tinkered" with CUDA, the learning curve is pretty non-existent. It is really that simple to code in CUDA.

Parallel programing for old SMP models where quite hard to do. It was...done. Not very good im afraid.
Parallel programing for old HT models where simple to execute. Due to the SMP legacy, efficient code was made.
This was for 2 or 4 CPUs.Or if you wish,making it from 2 - 8 Logical CPUs. I guess the step to 8 or more logical CPUs, although very good on a hardware POV, will be unpractical. It is just too fast for the world to adapt. I think i can safely call a Itanium here.

For example:

64 Bits CPUs are here for some time now. It is a major breakthrough in processing. It basicly double the reading power of a CPU. It is only NOW being adopted by mainstream software industry.

Multi-Core gaming (apart from some games) is still not the reality we all wish it was. I remember many games with frame skipping problems on X2 and Pentiums D. Many of those too their sweet time being patched.

HT & SMP paved a bit the way for Multi-Core. But the technologies are too different to be fully (or at least a good part) compatible when we read the many core scale.

Nehalem will be a excellent CPU for workstation/server applications. Witch by each version might be coded and adapted to the new hardware. The Kernels of those versions (and im talking linux, Solaris and other OSs, not windows.) already have a heavy legacy on Multi-cores.

In the mainstream i believe it will be a awful CPU because mainstream apps (and users) just wont take advantage of it.
Or do you believe Installing Windows Vista Ultimate will do the trick ? No way mate. That Kernel is broad sword, and no way it can be optimized so fast to a solution so radical. Take the example of CUDA for instance. GPGPU already been here a few years, and there are several (professional) applications already highly optimized for GPGGPUs. But the magic there isn't in the Kernel. Because it doesn't need to be in Kernel. Need to be in the Application.

For the magic to happen in a 8 to 32 Cores (logical or not) solution, the Magic itself must be in the Kernel. The management of apps, threads, Locks, cant be Application Controlled. And for you and me, mate, that we use Windows to game for example, we can forget about that world.

Sorry about the long post Turpit, it was a agreement and rant, to the hype nehalem is causing. The Size of the die it is just another problem we must add to the equation. Because the equation becomes rather large when we scale it to the 80 core they want to make.

The Equation becomes rather silly in a small time frame.
August 8, 2008 12:52:41 PM

does the link not work for anyone else?
!