Tabris DarkPeace said:
^ No it wouldn't.
- The analogy isn't outright wrong, but it does not factor in most common sense aspects of processor design.
- Instructions per clock cycle does not scale linearly with transistor count of the core (inclusive of L1 cache or not).
Transistor count these days is usually more than 70% L2 and L3 cache, and the more transistors are dedicated to cores the more cache is required as the interface to memory becomes a massive bottleneck.
- Doubling the transistor count by adding more cache won't double performance, it just decreases the cache miss rate.
- If the cache miss rate lowers from 50% to 0% as a result (highly unlikely) then, technically, it might.
You can halve the transistor count of the 'raw core' (excluding L1 cache) without halving performance!
- There are many processors that perform fine with just 3.3 to 10.0 million transistors, and taking those designs to 6.6 to 20.0 million transistors will not magically double the performance!
Consider that a 733MHz Pentium III with a 133FSB outperforms a 750MHz Pentium III with a 100FSB.
It takes 1.8 million transistors to have just 32KB of L1 cache last time I checked.
- L2 and L3 caches may require less transistors per bit to create a hybrid like SRAM.
- I say this a you're likely to have an extra 4,732 bytes per 32KB of L1 cache for management of the cache statistics, etc.
Generally speaking if you double the voltage on given silicon you can increase the clock speed by +42.13562%
- This is only true if the die is the same size, the transistor count is the same, and the surface area to contact ratio permits the processor(s) to be cooled effectively.
- The heat output and leakage may rise more than linearly (compared to your starting voltage) and the CPU may become hotter per cubic mm than a nuclear reactor!
- The power efficiency will be the reciprocal of this; as the voltage doubles the performance per watt will decrease by --29.28932%
Another simple comparison is the Pentium II to Pentium II
- 7.5 million transistors for P-II core --- Deschutes (80523)
- 9.5 million transistors for P-III core --- Katmai (80525)
- Over +25% more transistors per core, and the Pentium III is not +25% faster at executing x86-32bit code!
These two processors are available with very similar specs:
- 250nm fabrication (so the voltage and die size would be equal all else been equal)
- They both have 16 + 16 kB (Data + Instructions) L1 caches, using a Harvard or Hybrid Harvard Superscalar Architecture.
- They both have 512KB of L2 cache external to the processor
- The L2 cache is clocked at half the core frequency in both cases
- They are both available at 450 MHz (important for equal comparison)
- They both have a 100MHz FSB (less important for equal 'core' comparison, but important due to the availability of data).
- They both support MMX
- They both have a 2.0 Volt Vcore.
- Only one of them supports SSE
They do not have equal performance 'per transistor' dependant upon the workload.
Heck, if anything your argument is an argument for a reduction in transistor counts of processor cores and having 64 CPU cores with 16MB (or more) of on-die cache shared between all the CPU cores intelligently, as some cores may not be executing code that benefits from L2 cache at all, while others may benefit from 8MB, or more, of L2 cache alone.
- I am in no way against this idea
- Add a 32MB to 256MB L3 cache, and Quad-channel DDR4-SDRAM (which would be so fast as to be considered a L5 cache instead of 'System RAM').
- I think we'd all be very happy indeed with such a 2 billion transistor core processor (and associated motherboard chipset).
Noting that:
- The transistors for L2 and L3 cache (per KB or MB) can differ as there are different ways of making cache. (Faster with more transistors per bit, or slower and larger with less transistors per bit; some of these methods may have patents protecting them. Thankfully we have patents or there would be no technological improvement :-) ).
PS: There used to be an OpenSPARC CPU design website, and I put forward ideas there about pre-fetching and building systems that did not require Rambus style memory interfaces to scale. I think they're gone now but the 4KB prefetching was definitely used in something. (Obvious enough not to be patented and makes systems far more economical per dollar while keeping performance pretty darn close to just having gigabytes of SRAM or hyper Rambus implementations).
COULD YOU HAVE MADE THIS ANY MORE CONFUSING!!! AT ALL!!!! IF THAT IS EVEN POSSIBLE???? I THOUGHT I WAS JUST STARTING TO UNDERSTAND IT A LITTLE TILL YOU CHIMED IN WITH THE L2 CACHE THIS AND L3 CACHE THAT I GET THE FACT THAT YOU ARE REALLY REALLY SMART AND KNOW WHAT YOU ARE SAYING BUT SOMETIMES ACTUALLY MOST TIMES YOU GUYS OVER EXPLAIN THINGS AND MAKE PEOPLE JUST LEARNING NOT ONLY CONFUSED BUT YOU OUTRIGHT SCARE US FROM WANTING TO LEARN ANY MORE AFTER READING ALL THIS NOW I AM MORE CONFUSED THAN EVER AND NOW I HAVE A HEAD ACHE SOMETIMES IT IS JUST BETTER TO KEEP IT AS SIMPLE AS YOU CAN K THANKS