Sign in with
Sign up | Sign in
Your question
Closed

Intel's Knights Corner: 50+ Core 22nm Co-processor

Last response: in News comments
Share
November 16, 2011 12:12:22 AM

Lol wow 50 cores. Guess that makes AMD's 16-core reveal another flop.
Score
-21
November 16, 2011 12:19:03 AM

Makes me wonder just what we will have in ten years from now... Especially for personal computers.
Score
25
Related resources
November 16, 2011 12:21:49 AM

gmcizzleLol wow 50 cores. Guess that makes AMD's 16-core reveal another flop.

It's a Co-Processor, an accelerator, not a main CPU they are not the same thing.
Score
43
November 16, 2011 12:26:28 AM

I wonder when will "Co" get separated from the Processor and get into motherboard sockets other than pci-express cards.
Score
12
November 16, 2011 12:27:41 AM

What about GPU processing? Isn't that what CUDA is for? After all, my 460 GTX has 336 cores.
Score
5
November 16, 2011 12:28:36 AM

gmcizzleLol wow 50 cores. Guess that makes AMD's 16-core reveal another flop.

Not an absolute flop (it does provide a good price/performance ratio) but not good either. There's just no getting around the inherent flaws in the current revision of the Bulldozer architecture, even in the highly parallel workloads found in the server/workstation market:

http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200

And Knights Corner isn't serving the same market as Interlagos, so they're not really directly comparable.
Score
18
November 16, 2011 12:28:37 AM

Hmm.
You know what this is?

I think this is Intel's answer to ARM's server bids.

Think about it.
50+ cores at 1.2 GHz? That sounds a lot like what ARM will be promising in the near future.

Except that everyone who wants to go the low-power route needs to re-write their programs for the ARM instruction set. With this they don't have to. The tools for Xeon optimization are also the same.

So you can have a powerful 4/6/8/10-core Xeon processor (that you probably already own) but bolting this on, combined with Intel's advancements in power consumption (Sandy Bridge is already very good on idle battery life in notebooks), should make a changeover to ARM technology a hard sell.
Score
21
November 16, 2011 12:29:47 AM

This isn't a CPU, it is more like the function of a GPU used in parallel processing.
Score
14
November 16, 2011 12:36:39 AM

So, "it won't be used to play Crysis", what about Battlefield 4? Patiently waiting.....
Score
-13
November 16, 2011 12:50:14 AM

So, Knight's corner is Larrabee?
Score
20
November 16, 2011 12:51:13 AM

oparadoxical_Makes me wonder just what we will have in ten years from now... Especially for personal computers.


GPUs like the 6970 have around 2500 vector cores. Like FPUs in the OP, they can't do the full spectrum of x86 instructions and can only do a specialized subset for one task.

Likewise, we have growing numbers of do everything cores on a die.

One important abstraction is that "cores" are just an FPU, SPU, TLB, etc, all on a die. A 4 core chip is basically 4 processors on one piece of silicon with one bus. A GPU is 2500 VPUs with shared memory, shared FPUs, and a shared bus and output.

The end game is that we have processor chips with specialized parts doing different, specialized tasks, all on one die. Like how Sandy Bridge had integrated graphics, that is just fancy abstraction for throwing a bunch of VPUs on the die that the CPU cores can access with their own bit of l3.

In a decade, expect processor chips to have much more cache, and a collection of VPUs / SPUs / etc on top of some registers and TLBs representing the limits of parallelism.

You merge the cores, and get processors of, say, 256 cores, where 64 of them are general purpose TLBs / Register sets and 192 are mixed FPU / VPUs doing hard computations for the general cores. If you add some floats, that work would be sent to an FPU to do, if you had a munch of float math in parallel, the process would have each operation delegated to an FPU.

Thats in mainstream computing, I think. Server markets are going towards specialized subset instruction set hardware that can't do normal computing tasks, but doesn't need to and actually shouldn't to save power. Every instruction you throw on the cpu pile means more transistors dedicated to operation decoding you could be using for more FPUs and such.
Score
18
November 16, 2011 12:51:53 AM

Do not fall for AMD's marketing. They do not a have 16 core chip. They have an 8 "module" chip that has 16 integer processors, but does NOT have 16 full cores.
Score
-5
November 16, 2011 12:52:24 AM

gmcizzleLol wow 50 cores. Guess that makes AMD's 16-core reveal another flop.

This is a co-processor. The 6990 does 1.37 TFLOPs Double Precision and 5.40 TFLOPs Single Precision. The 7990 and above should be much higher.
Score
6
November 16, 2011 12:55:16 AM

multicore processor in PCIe slot
doen't sound like GPU?
Score
21
November 16, 2011 12:56:37 AM

danwat1234So, Knight's corner is Larrabee?


More or less, this took the design elements from the scrapped Larrabee project.
Score
23
November 16, 2011 12:59:39 AM

oparadoxical_Makes me wonder just what we will have in ten years from now... Especially for personal computers.

why wait 10 years...take a look at the mid line of graphic cards these days, they have at least 200+ cores....intel and amd are a little behind
Score
-10
November 16, 2011 1:04:30 AM

Interesting, but does the Knights Corner be able to run Lab Windows CVI, Labview, LTSpice, Cadence SPB/OrCAD, or Solid Edge?
Score
4
November 16, 2011 1:06:37 AM

Wow, a lot of co processing data, will be very useful for engineers and graphic designers. Maybe 6 years from now.
Score
6
November 16, 2011 1:11:23 AM

This could be used for A.I in pc games smarter?
Score
1
November 16, 2011 1:20:23 AM

Brute force!
Score
8
Anonymous
November 16, 2011 1:25:49 AM

comparing amd's chip to this is comparing apples to oranges. ive always wondered if they would go this route cause to me it always seemed inefficient to stack more and more expensive powerful cores together conmpared to supplying a couple powerful cores and lots of slow cores together. since it can be coded by anyone the possibilities are endless
Score
3
November 16, 2011 1:40:41 AM

But Bulldozer hit 8.59 GHz with liquad helium cooling.
Score
-7
November 16, 2011 1:44:56 AM

I resent any processor that is not made to benchmark Crysis.
Score
22
November 16, 2011 2:02:13 AM

Wow, haven't seen a co-processor in a while (looks at his 8087 IC) thought this was the age of integration...
Score
6
November 16, 2011 2:07:16 AM

sinfulpotatoThis isn't a CPU, it is more like the function of a GPU used in parallel processing.

this is very true, but GPU processing is much more advanced, i am reffering to CUDA processing not STREAM processing
Score
-7
November 16, 2011 2:14:33 AM

I remember when they added the math coprocessor to the 486...... Seems similar. :D  :D  :D 
Score
11
November 16, 2011 3:01:41 AM

22nm die. hmmmmm they seem to get smaller and smaller. just like children.
Score
5
November 16, 2011 3:57:08 AM

wondering which instruction set it will use?
Score
0
November 16, 2011 4:10:48 AM

acadia11But Bulldozer hit 8.59 GHz with liquad helium cooling.


Who cares? It's not like that's useful. More importantly, Bulldozer hit 8.83 GHz on Pluto. AMD is pretty sure they can beat that on the dark side of Mercury, so don't be surprised if you see a new record of 8.926 set soon, although that's just a rough estimate.
Score
4
November 16, 2011 4:27:54 AM

This is what is left of Intel's bid to compete in the GPU market, Larabee. A co-processor with very lack luster compute performance compared to todays GPU cards but really it can neither do GPU work not CPU work, so one has to wonder why Intel even bothers. Seems like a desperate, an failed, last attempt at trying to keep up with ARM.
Score
-1
November 16, 2011 4:29:10 AM

Yep, this looks like the current incarnation of Larrabee. Bets are open on whether this will actually make it to market; Intel's trying to push for outright dominance in a field they've been outside of the whole time, while nVidia and AMD have been working with years and years of experience under their respective belts.

Since this is an external co-processor run through PCI-express, this makes it not different at all from nVidia or AMD's GPGPU solutions... And that means Intel's badly beat with only 50 cores. Even assuming that the 1 TFLOP number is actually double-precision, that makes this still-not-officially-benchmarked-or-released chip only perhaps in the same LEAGUE as existing, in-market stuff out now. How does Intel expect this to compete when AMD *already* has a 676 GigaFLOP (over 2/3 the power) card available, RIGHT NOW, for under $400US? By the time Knight's Corner could release, AMD will have their Tahiti 7970 out, which likely will rock in at 2+ TeraFLOPs for a single GPU, at the same sub-$400US price point.

soccerdocksDo not fall for AMD's marketing. They do not a have 16 core chip. They have an 8 "module" chip that has 16 integer processors, but does NOT have 16 full cores.

If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.
Score
11
November 16, 2011 4:50:26 AM

Intel's version of a GPU without the frame buffer, translation units, and other visual components. Just the raw VPU's, local memory and some sort of I/O controller. Might be interesting as a "drop in" style card.

For instruction set, it'd have to be SSE / AVX style SIMD instructions.
Score
3
November 16, 2011 5:26:38 AM

If it's a co-processor, what's the other processor? An i7?
Score
2
November 16, 2011 6:00:51 AM

oparadoxical_Makes me wonder just what we will have in ten years from now... Especially for personal computers.

I don't know about 10 years time, but I want 20+ cores in a desktop processor in 2013 or sooner. Ivy Bridge needs to support up to at least 8 cores (16 threads on i7 models) and before you say lol moar cores - I would utilize every single core rather heavily.
Score
-3
November 16, 2011 7:18:31 AM

gmcizzleLol wow 50 cores. Guess that makes AMD's 16-core reveal another flop.

Sorry to say it but is this site full of retards ? Unless intel sells that 50 core cpu at the same price as AMD`s 16 core (yeah right in a paralel universe) you just look like a retard that bashes AMD for absolutely no reason at all.
Score
9
November 16, 2011 7:20:22 AM

This is a parallel CPU core, its more akin to a graphics processor, which are already nearing the TFLOP mark. AMD already has that in their HD7000 GPU. Don't get excited kids, this is a specialized CPU made for special tasks. It is NOT for your desktop or mobile PCs.
Score
5
November 16, 2011 8:09:33 AM

PheruleI don't know about 10 years time, but I want 20+ cores in a desktop processor in 2013 or sooner. Ivy Bridge needs to support up to at least 8 cores (16 threads on i7 models) and before you say lol moar cores - I would utilize every single core rather heavily.

Not gonna happen. Following Intel's "Tick tock" strategy, 2012 sees us our next die shrink, (to 22nm from 32nm) then 2013 sees the next generation architecture, Haswell, get produced. Intel only does "tick" die shrinks on even-numbered years.

Intel's 32nm fits up to 6 cores. (as seen on both Sandy Bridge-E and the prior-gen Gulftown) A die shrink will double the effective useable die area... So at MOST you'd get 12 cores per die. However, Intel's strategy has focused a LOT more on the "uncore" part of the chip, including integrated graphics. (that appear even in the i7s) Chances are good more development of these will occur there, and along with intentions to yield still-better per-clock performance than Sandy Bridge, core size will go up as well. 10 cores is a POSSIBILITY for Ivy Bridge E; I think 8 is more likely. So they could do 16-core stuff, but dual-die CPUs are so large and unwieldy they only go in server sockets, so we're talking a Xeon, NOT an i7.

The reason AMD's CPUs have more cores is more due to design emphasis; Intel's focusing more on the uncore, as well as individually beefier cores, than core count. So I expect AMD to retain the most cores-per-die in this regard, provided they keep up in terms of manufacturing processes.
Score
0
November 16, 2011 8:44:39 AM

Isn't it an analogue to gpu?
Score
0
November 16, 2011 8:49:21 AM

Intel isn't being fair. They're just beating up on AMD to make a point.
Score
-3
November 16, 2011 9:12:18 AM

I'm not impressed at all. Doesn't GPU's perform over 2 Teraflops by now? Also the "ASCII Red" from 1997 was NOT the first system capable of over 1 TFLOP. The first known system was the Cray T3E that cam by the end of 1995 and it was advertized to deliver "over" 1.6 TFLOPS.
Score
2
a b å Intel
November 16, 2011 9:38:11 AM

a gpu version of this 50 core thingie might make a capable igp for haswell. just wondering.
Score
0
November 16, 2011 10:41:38 AM

nottheking If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.

Well then they have managed to bring a better CPU to the market which is what R&D is all about after all. But, alas, it's still a mere 8 cores (with an enhanced variant of hyperthreading/SMT) no matter how you paint it, and not the 16 as they are falsely advertising.

But comparing AMD's Bulldozer chip with the Knights Corner is like comparing apples and oranges. It would be a lot more fair to compare it to AMD's Southern or Northern Islands chips or nVidias GPU chips.
Score
3
November 16, 2011 11:57:13 AM

g00eyI'm not impressed at all. Doesn't GPU's perform over 2 Teraflops by now? Also the "ASCII Red" from 1997 was NOT the first system capable of over 1 TFLOP. The first known system was the Cray T3E that cam by the end of 1995 and it was advertized to deliver "over" 1.6 TFLOPS.

Actually, as I mentioned, the highest-performing single-GPU card only does about 675 GigaFLOPS. Keep in mind that the numbers are different depending on the level of precision: supercomputers are measured using DOUBLE-precision floating-point, aka 64-bit FP. This level of precision is what's needed for scientific and engineering tasks. Meanwhile, standard 3D rendering, gaming, and media tasks are fine using 32-bit single-precision FP. Hence, a lot of consumer-targetted equipment is measured using single-precision; the teraflop figures from AMD (as well as the entirely made-up teraflop figures for the consoles) are referring to single-precision power.

Depending on the architecture, single-precision FP units can be used to produce double-precision results, the double-precision levels will be as much as half the single-precision, (some types of units, such as x86 FPUs that use AVX, namely Sandy Bridge and Bulldozer, as well as the PowerXCell 8i) to quarter (Radeon 6000-series GPUs) a fifth (Radeon 5000-series GPUs, newer nVidia GPUs) to as low as a tenth. (older nVidia GPUs, the PS2's Cell)

And no, ASCI Red was the first computer to actually pass 1 teraFLOP. Just because Cray advertised the performance doesn't mean everyone was getting it; supercomputer performance on the TOP 500 list isn't done off of theoretical peak numbers, but actual, real-world benchmark results. This allows for measurements of just how PRACTICAL those math units are, and what they ACTUALLY can achieve. (as a note, currently Intel CPUs tend to get a slightly closer to their theoretical numbers than AMD CPUs do, and GPGPUs don't get anywhere near close) The first Cray T3E that passed 1 TFLOP wasn't built until 1998.

g00eyWell then they have managed to bring a better CPU to the market which is what R&D is all about after all. But, alas, it's still a mere 8 cores (with an enhanced variant of hyperthreading/SMT) no matter how you paint it, and not the 16 as they are falsely advertising.

It's hardly so clear-cut. Yes, it does blur the line on what level of parallelism is being done here, but they're very much cores given that they have complete hardware capability to run two threads per module, and not merely "virtualize" two threads akin to Hyperthreading. Each module has TWO sets of L1 data cache, and is capable of running two floating-point threads in hardware. (with the FPU being the contentious point here) The only exception, technically, is AVX, but Sandy Bridge can't truly support a full thread of AVX in a single core either.

And of course, keep in mind that the concept of a "core" isn't quite fully defined either; it just kind of emerged as an alternative to "extra CPU" around 2004 when the Pentium D and Athlon64 X2 came out... And there was bickering there, too. Yet no one deemed that a chip failed to be "dual-core," if it merged or removed redundant parts, such as the memory interface or I/O, or shared a pool of cache. Looking at that philosophy, Bulldozer's modules are the next logical step in the evolution of sets of 2 cores.
Score
6
Anonymous
November 16, 2011 12:08:43 PM

Tegra, cell, arm, opencl ,cuda...done right the Intel way. If I understand this well programs can use this parallel computing power much more easy, it's seen as a network with render nodes??
Score
0
November 16, 2011 12:21:31 PM

nottheking... It's hardly so clear-cut. ...

Yes, it is, at least to me. Just because you put two ALU's into each core it doesn't double the core count just because of that. Yes, the design of each core (which is now called "module" by AMD) is improved but, still the Bulldozer CPU which they claim is 16 core is in fact only 8 core.

I think they are really making a fool out of themselves by calling it a 16 core. It's like I would work at Harvard for a couple of years as a research assistant and then walk around saying that I've got a PhD at Stanford and worked there as a Professor.

I respect you for your post but when it comes to the core count, you don't win me over unfortunately.

In Short: AMD's marketing = EPIC FAIL! *lol*

Score
-3
November 16, 2011 12:43:15 PM

notthekingAnd no, ASCI Red was the first computer to actually pass 1 teraFLOP. Just because Cray advertised the performance doesn't mean everyone was getting it; supercomputer performance on the TOP 500 list isn't done off of theoretical peak numbers, but actual, real-world benchmark results. This allows for measurements of just how PRACTICAL those math units are, and what they ACTUALLY can achieve. (as a note, currently Intel CPUs tend to get a slightly closer to their theoretical numbers than AMD CPUs do, and GPGPUs don't get anywhere near close) The first Cray T3E that passed 1 TFLOP wasn't built until 1998.

Exactly, the fujitsu K supercomputer has made to the top in the November 2011 top 500 list (see: http://www.tomshardware.com/news/supercomputer-top500-p...) because the system was actually build and tested. The K supercomputer is the first to hit the 10 Petaflop barrier. Fujitsu already announced the PRIMEHPC FX10 Supercomputer that is scalable to 23.2 Petaflops, however nobody has build or even ordered one of those yet.
Score
2
November 16, 2011 12:54:30 PM

nottheking said:
If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.


Not quite true. According to http://www.realworldtech.com/page.cfm?ArticleID=RWT0918...:

Quote:
The execution units in Sandy Bridge were reworked to double the FP performance for vectorizable workloads by efficiently executing 256-bit AVX instructions. Almost all 256-bit AVX instructions are decoded into and execute as a single uop – in contrast to AMD’s more cautious embrace of AVX, which will crack 256-bit instructions into two 128-bit operations on Bulldozer.

Sandy Bridge can sustain a full 16 single precision FLOP/cycle or 8 double precision FLOP/cycle – double the capabilities of Nehalem. This guarantees that software which uses AVX will actually see a substantial performance advantage on Sandy Bridge and should spur faster adoption.

Sandy Bridge can execute a 256-bit FP multiply, a 256-bit FP add and a 256-bit shuffle every cycle. However, the floating point data paths were not expanded and are still 128-bits wide; instead the SIMD integer data paths are enlisted to assist with AVX operations.


This is why for desktop apps, the 4-module Bulldozer 8150 gets beaten by the 4-core 2600K: http://www.tomshardware.com/reviews/fx-8150-zambezi-bul...

Quote:
Integer and floating-point math are both improved in the Bulldozer architecture, allowing the FX-8150 to place second behind Intel’s Core i7-2600K.

Exceptional integer SSE2 performance catapults FX-8150 ahead of Intel’s lineup in Sandra’s Multimedia metric. Shared floating-point units aren’t able to achieve the same results, though FX-8150 nearly matches Intel’s Core i7-2600K.


Score
1
November 16, 2011 1:57:54 PM

g00eyIn Short: AMD's marketing = EPIC FAIL! *lol*

I would've dignified you with more of a response (which would've just really been a rehash of what I've already told you twice) but this line here sorta indicated that it'd just go over your head, too.

aldaiaExactly, the fujitsu K supercomputer has made to the top in the November 2011 top 500 list because the system was actually build and tested... The PRIMEHPC FX10 Supercomputer that is scalable to 23.2 Petaflops, however nobody has build or even ordered one of those yet.

Yep, you pretty much managed to sum it all up right there.

fazers_on_stunNot quite true. According to http://www.realworldtech.com/page. [...] 91937&p=6:This is why for desktop apps, the 4-module Bulldozer 8150 gets beaten by the 4-core 2600K: http://www.tomshardware.com/review [...] 43-14.html

I could be mistaken then; I had recalled that Sandy Bridge's FPUs allowed it to have 256-bit wide AVX registers, but that actual execution/retirement of the instructions was basically "pipelined," though this shows that instead it "steals" capability from the integer units to achieve the full width. Of course, this is probably a better solution, as it winds up taking resources that are likely less going to be used. (as opposed to what would basically be its twin's FP SSE unit)
Score
0
November 16, 2011 2:29:55 PM

notthekingI would've dignified you with more of a response ...

I take it that you in some way or other are working for AMD or even representing AMD. Stealth marketing where people boast their products is kind of common in forums such as these.

It's nothing personal I'm just being honest with my opinions. We have a market today where marketing buzzwords such as Ultra, Mega, Power, Super and all possible combinations thereof are received in strong disbelief. To pursue a dishonest marketing strategy where you state that a product has a certain feature that it actually has not will only backfire in the end.

Try selling say 1 kg of washing powder of a known brand as 2 kg in the supermarket and charge at the 2 kg price point. It doesn't even matter that the powder is twice as good as "normal" washing powder, people will get upset and the manufacturer will try to sue you for damaging their brand.

So I think that companies such as AMD and Intel will gain a lot more in the long run by sticking to fair and honest marketing. But it is the quality and performance the CPU delivers that matters in the end, so if AMD manages to impress on the benchmark tests and the consumers while selling at a competitive price point AMD may get away with their misleading advertising anyway.
Score
-5
!