Intel Patents Redundant Cores In a Many-Core Processor
Intel has just been granted a patent that claims the rights to the concept of using initially inactive processing cores to replace failing cores.
According to the patent, increasingly complex processors with a greater number of cores, referred to as many-core processors by the company, will see higher failure rates than single- or dual-core processors. In fact, the patent states that the lifetime of a core may "shorten from generation to generation." The reasons include electromigration, stress migration, time dependent dielectric breakdown, negative bias temperature instability (NBTI), and thermal cycling.
To alleviate failure concerns, the patent covers an approach of core management, which is heavily focused on temperature monitoring of the individual cores: "Because many semiconductor failure mechanisms are expressed at elevated temperatures, temperature thus has a direct bearing on core MTTF [mean time to failure] and many-core reliability," the patent document explains. "If the temperature cannot be decreased, a many-core processor would activate spare cores to protect both the possibly failing core as well as neighboring cores. Both failed and spare cores are described to "absorb heat generated by active cores, driving the temperatures on the active cores down."
In an allocation/reallocation scenario, Intel says that the temperatures of cores can be drastically reduced.
There is no indication when Intel will actually use such a technology, but the examples in the patent start with at least 32 cores total, which use 16 active and 16 spare cores.
extend the lifetime and reliability of the whole endevor without having to have wholy seperate computers..
As I imagine activating the inactive cores will give little to no benefit.
Although I don't really see how this is patentable. Having redundant hardware to click in in case of failure?
I guess you should be able to get round but having all cores active when needed, and sleeping when not (as is possible today), and making it so a processor can carry on if a core dies mid process.
extend the lifetime and reliability of the whole endevor without having to have wholy seperate computers..
Why not ship the CPU with all cores active and give it a 'soft fail' feature for failing cores?
By 'soft fail' I mean that a failing core could be dynamically deactivated while allowing all other cores to function normally.
This would allow you to have higher initial performance and give you uninterrupted computing in the case of a core failure.
Standard fail-over systems used on spacecraft, and by every single serious server provider in the world, are completely seperate systems that steps in if one system fails. It's completely unrelated to this patent both in spirit and in practice, as it is much safer to switch to a different system all-together instead of relying on fail-over on the same silicon that is failing in the first place. No serious business would ever rely on that.
This would mean that it would only got hotter faster, and possibly heat up all the cores to a point of deactivation. At that point you would technically have no CPU anymore.
Not really sure how
As long as you have a properly designed cooling system, you should have no issues at all.
When a defective core is disabled (presumably by power gating the affected area of the chip) overall power consumption (and therefore heat production) will be reduced.
This will lead to a cooler system as it ages, not a hotter one...
Intel is talking about a processor that is super mission critical. Think space, nuclear, or some other place that man can't go to repair a computer. It sounds to me that Intel may be gearing up to knock out IBM/Motorola in the industrial compute department.
I see it this way, if you have all your cores running at 100% they all will get hot quick. Since they all will have a similar temperature they may all reach the critical temperature at one point, which would signal them to shut off. Since Intel is working on tri-gate transistors I believe they mean they will stack cores on each other, thus making it harder for even the best coolers to cool. With this method of disabling cores and having inactive ones they can better compensate for that fact.