Intel's Xeon Cascade Lake CPUs Have Reached the End of the Road

2nd Generation Xeon Scalable Processor (Image credit: Intel)

Intel has announced the discontinuance of the company's 2nd Generation Xeon Scalable (Cascade Lake) processors. Cascade Lake is two generations behind Intel's existing 4th Generation Xeon Scalable Sapphire Rapids processors, so it's remarkable that the former survived this long.

Cascade Lake has been around since 2019, which was the year when Intel released it to replace the long-lasting Skylake microarchitecture. Poor Cascade Lake has gone through a lot. Due to the constant pressure by AMD's 7nm EPYC Rome chips, Intel was forced to discontinue some of its Cascade Lake Xeon SKUs precipitately while lowering the pricing on the surviving models. Shortly after, Cascade Lake went through a refresh, introducing the Xeon Cascade Lake Refresh parts with a pricing reduction of up to 60% per core.

For those of you who know Intel, the chipmaker doesn't have a habit of slashing prices on its processors, much less launching a refresh in such a short time. Cascade Lake's extreme feature segmentation contributed to the microarchitecture's downfall. For example, not all Cascade Lake chips supported the same amount of memory or Optane DC Persistent Memory DIMMs. On the other hand, AMD's EPYC Rome lineup offered consumers the same feature set across all its SKUs.

Intel terminated Cascade Lake-X (HEDT) and Cascade Lake-W (workstation) processors in July of this year. As expected, the Xeon chips are the next parts to get the chopping block. The product discontinuation applies to both tray and boxed Cascade Lake Xeon processors. Intel listed 68 Cascade Lake products in its PCN (Product Change Notification) document. The chipmaker's customers have until April 26, 2024, to submit the last orders to their local Intel representative. Intel has committed to ship the last Cascade Lake Xeon orders out by October 23, 2026; therefore, Cascade Lake won't completely disappear from the shelves for at least another couple of years.

In another PCN document, Intel highlighted that there is no change for Intel embedded customers. Thus, Cascade Lake-embedded processors are the last 22 living members of the Cascade Lake lineage. Embedded products have a longer life span than their socket counterparts, so it's unsurprising that the Cascade Lake embedded SKUs will be around for a bit longer. Finally, Intel has transferred its Cascade Lake embedded chips over to the Intel Embedded Architecture for continued support, effective after October 23, 2026.

See more CPUs News

TOPICS

Zhiye Liu is a news editor and memory reviewer at Tom’s Hardware. Although he loves everything that’s hardware, he has a soft spot for CPUs, GPUs, and RAM.

10 Comments Comment from the forums

emike09

I have absolutely loved my Cascade Lake-X i9-10920X. Bought it when it came out, still rocking it. It's been running almost 24/7 since 2019 at 4.8GHz all-core on water. Excellent at production projects, excellent at gaming, and I love all those PCI-e lanes. What an incredible CPU and platform. A legend of CPUs. It's getting old, and I'm thinking of replacing it in a year or two if a descent replacement ever arrives.
Reply
bit_user

emike09 said:
I have absolutely loved my Cascade Lake-X i9-10920X. Bought it when it came out, still rocking it. It's been running almost 24/7 since 2019 at 4.8GHz all-core on water. Excellent at production projects, excellent at gaming, and I love all those PCI-e lanes. What an incredible CPU and platform. A legend of CPUs. It's getting old, and I'm thinking of replacing it in a year or two if a descent replacement ever arrives.
The main thing that tarnished Cascade Lake was AMD's Rome (7002-series Epyc) and ThreadRipper 3900X, both of which launched in 2019. You talk about PCIe lanes, but TR had 64 @ PCIe 4.0 and between 32 and 64 cores. Cascade Lake might've had an edge on single-thread performance, but it couldn't match TR's all-thread performance. Had it not been for an unprecedented surge in demand, the situation could've been a lot worse for Intel. As it was, they were forced to take a somewhat unprecedented move of price-cutting:
https://www.anandtech.com/show/15542/intel-updates-2nd-gen-xeon-scalable-family-with-new-skus
The second big problem Cascade Lake faced was AVX-512 -induced clock throttling, which could be quite severe. Fortunately, it's not as bad in Ice Lake and almost gone in Sapphire Rapids.

One thing I don't understand about Cascade Lake is why it took Intel 2 years between Skylake SP and Cascade Lake. Was it meant to be only one? Or was Cascade Lake only done to plug a gap filled by a 10 nm predecessor to Ice Lake that got cancelled (i.e. like maybe Cannon Lake SP)? It does seem like there should've been something in that socket, after Skylake. Intel's normal cadence for server CPUs is to do 2 generations per socket - just like consumer. So, if not Cascade Lake, then something else?
Reply
Tech0000

And I absolutely love by my core i9 10980 XE on the X299 platform!! After some careful OC*) ( which is just pure pleasure) the two AVX-512 units/core sings at 4.5 GHz fully saturating the memory hierarchy - and how well they sing! *)With all due respect, who the hell runs a 10890XE at stock? Throttling never happened to me lol. Anyway, 4 years later of OC 24/7 and never a hick up, they sing as clearly as day 1. The point with getting the 18 core 10980XE was the package: the PCI 48 lanes, the VROC (yes it works), the 256GB RAM at XMP2.0 OC, and the amazing power of 36 AVX-512 units resulting in @ 1/2 clocks vfmadd213ps throughput/core. That is just amazing!
and for the record, people concerned with the recently published AVX Gather instruction security issues, the key is to organize data structures to avoid using AVX Gather instructions (e.g. vgatherdps/pd etc) in the first place ( Gather/Scatters are horribly inefficient and blowing up all levels of cash if you use large data sets, so I design all my data structures aligned so you can can stream read and write to them, avoiding Gather and Scatter instructions all together).

I must say, my 10980XE/X299 system has served me extremely well, and I'm looking forward to upgrading to something like the Granite Rapids next year (pending pricing) to utilize more PCI lanes, more cores, etc, and AMX BF16, F16 and the complex (FP16, i FP16) data type - WOW just WOW - the opportunities this opens up for!
Reply
bit_user

Tech0000 said:
With all due respect, who the hell runs a 10890XE at stock? Throttling never happened to me lol.
That's because you're not using a Xeon. You can't overclock those Xeon SP models and they will throttle aggressively to stay within their power limits. It could cause all sorts of problems and blow apart TCO calculations, if server CPUs would use more power & generate more heat than they're supposed to.

Tech0000 said:
the VROC (yes it works),
I never got the point of VROC. I use software RAID on Linux. It's free, fast, and works with any CPU!

Tech0000 said:
the amazing power of 36 AVX-512 units
No, it's only 1 per core. Unless you're talking specifically about the FMAs. Those are indeed 2 per core, on the higher-end models like yours.

Tech0000 said:
people concerned with the recently published AVX Gather instruction security issues, the key is to organize data structures to avoid using AVX Gather instructions (e.g. vgatherdps/pd etc) in the first place
Obviously, but people typically use scatter/gather in cases where you simply can't avoid random-access.

Tech0000 said:
( Gather/Scatters are horribly inefficient and blowing up all levels of cash if you use large data sets, so I design all my data structures aligned so you can can stream read and write to them, avoiding Gather and Scatter instructions all together).
Right. That's why scatter/gather is usually a last resort. I will say I once wrote an optimized image transpose using SSE, and it sure was annoying to handle all of the alignment issues and ensure coherent data access patterns. I can see people being tempted to use scatter/gather, in such cases.

Tech0000 said:
I'm looking forward to upgrading to something like the Granite Rapids next year (pending pricing) to utilize more PCI lanes, more cores, etc, and AMX BF16, F16 and the complex (FP16, i FP16) data type - WOW just WOW - the opportunities this opens up for!
That's cool. I like hardware, but it's been a while since I've found anything that excited me so much to use it for. I guess that's why it's taken me about 10 years to upgrade.

The thing that makes me sad is how expensive even the Xeon W-2400 series is. My old workstation is a Xeon EP with quad-channel memory, but I cannot justify spending what any of the Xeon W CPUs or W790 motherboards cost, these days. It's a good thing I don't really need one. I did like having plenty of PCIe lanes, but I figure I can get by with a decent W680 board.
Reply
Tech0000

bit_user said:
That's because you're not using a Xeon. You can't overclock those Xeon SP models and they will throttle aggressively to stay within their power limits. It could cause all sorts of problems and blow apart TCO calculations, if server CPUs would use more power & generate more heat than they're supposed to.
This is the enthusiast HEDT platform I am talking about not the server platform.
I was making the case for my 10980XE not any server SKU. Intel is notoriously conservative, I can assure you that the HEDT SKUs like Core i9 SKU's e.g. 10980XE works superbly with AVX-512 OC on all cores at 4.5 GHz 24/7 for 4 years and counting. Server SKUs not allowing OC is a different issue but i never claimed them to be OC-able.

bit_user said:
I never got the point of VROC. I use software RAID on Linux. It's free, fast, and works with any CPU!
- VROC allowed (and still allows me) me to use 4 intel Optane drives saturating PCIe 3.0 X 4 for 4 drives in RAID 0 with 10DWPD endurance individually/drive. Outstanding in 2019. Of course you can do better today. But 10DWPD endurance yeah that is pretty secure... Even with 4 years of work, I today still have 100% life left in them (SMART read out)

bit_user said:
No, it's only 1 per core. Unless you're talking specifically about the FMAs. Those are indeed 2 per core, on the higher-end models like yours.
Of course I was talking about AVX-512 units (FMA) that is what i wrote and there are 2 AVX-512 units per core so i total 36 AVX-512 units for a 10980XE. Most AVX-512 instructions are executed in parallel using both units/core producing throughput of 1/2 clock cycles or faster even for complex fused multiply and add instructions = 16 single precision floats (or 8 double precision floats), i.e. each core delivers 32 fused multiplies and add single precision floats per clock cycle - the constrain is not the CPU but memory it turns out. Dude, I do this work every day. I am not some influencer journalist or marketer.

bit_user said:
Obviously, but people typically use scatter/gather in cases where you simply can't avoid random-access.
yeah, what can i say. There are a lot of bad/unqualified programmers who do not have the necessary theoretical back ground or right incentive to design and write good systems. I recommend them (not you) to go back to school and take some good classes to further the knowledge. and no I do not mean the elementary undergrad data structures and algorithms classes. I also recommend companies to reward designing and writing efficient code - sarcasm - yeah that will not happen, but it was fun to tell the truth for once... lol

bit_user said:
Right. That's why scatter/gather is usually a last resort. I will say I once wrote an optimized image transpose using SSE, and it sure was annoying to handle all of the alignment issues and ensure coherent data access patterns. I can see people being tempted to use scatter/gather, in such cases.
same comment as above.

bit_user said:
That's cool. I like hardware, but it's been a while since I've found anything that excited me so much to use it for. I guess that's why it's taken me about 10 years to upgrade.
my previous system before the 10980 was actually an Intel Xeon W3690 / x58 based system with 6 cores (Xeon) and I still use that machine for common tasks (not dev) and it is still running stable now for more than 12 years. Just upgrade the GPU every now and then. Those days you could put a Xeon W3690 into a X58 consumer board - amazing platform! OC Stability is like 100% for 12 years now. Honestly, you do not really need more that that for common every day tasks if you have a good GPU.

bit_user said:
The thing that makes me sad is how expensive even the Xeon W-2400 series is. My old workstation is a Xeon EP with quad-channel memory, but I cannot justify spending what any of the Xeon W CPUs or W790 motherboards cost, these days. It's a good thing I don't really need one. I did like having plenty of PCIe lanes, but I figure I can get by with a decent W680 boa
I agree the W790 platform is an insult to enthusiast HEDT users like me and i think you. W2400 or W3400 series alike, they are not priced for the enthusiast market, but only for enterprise market. and even then, putting on my enterprise hat, I would not be able to justify the performance vs price per core ratio. The step from a 10980XE or ice lake or AMDs current offering or similar, is just not big enough - W2400 or W3400 pricing per core is insane.
So i am hoping that Granite Rapids will give opportunity for intel to produce an enthusiast version. Sapphire and Emerald are out of reach. holding out 1 more year and see what comes...
Reply
WallyJames

I think that Intel is great option for now, but for this time we see that Apple proccesor M2 pro is really powerfull,in 1 chip we have the videocard ,processor ,and a lot of features,in future we will se the gaming on apple laptops
Reply
emike09

bit_user said:
The main thing that tarnished Cascade Lake was AMD's Rome (7002-series Epyc) and ThreadRipper 3900X, both of which launched in 2019. You talk about PCIe lanes, but TR had 64 @ PCIe 4.0 and between 32 and 64 cores. Cascade Lake might've had an edge on single-thread performance, but it couldn't match TR's all-thread performance.
...............
The second big problem Cascade Lake faced was AVX-512 -induced clock throttling, which could be quite severe. Fortunately, it's not as bad in Ice Lake and almost gone in Sapphire Rapids.
Agreed, TR was a much better platform for all-thread performance. I considered the i9-10980XE for the extra cores, and even TR, but as it's my home production workstation and gaming rig all in one, I needed the right balance and mix of multi-threaded and single-threaded performance. The i9-10920X can be overclocked much higher than the 10980XE, giving me that single-thread performance I often need. TR 3000 series didn't win any gaming benchmarks, by a long shot.

I know most professionals using a chip like this wouldn't game with it, but I do. Feeds the 4090 nicely and never a bottleneck, though PCI-e 3.0 has a tiny performance impact, but it's negligible. I also run Hyper-V with several VMs, the Adobe production suite, Cinema 4d, Unreal 5, 3Ds Max, NAS, etc. It's an enthusiast high-end desktop chip. If it were mission critical, I'd run server hardware. I just love having one system to rule them all at home.

48 PCIe lanes was and is sufficient for me, for now. I can't imagine going back to 20 or 24 found on consumer platforms. I'd actually consider something like the i9-13900K if it had more lanes. AFAIK, nothing I use utilizes AVX-512, so I don't feel like I'm missing out much there.
Reply
bit_user

Tech0000 said:
- VROC allowed (and still allows me) me to use 4 intel Optane drives saturating PCIe 3.0 X 4 for 4 drives in RAID 0 with 10DWPD endurance individually/drive. Outstanding in 2019. Of course you can do better today. But 10DWPD endurance yeah that is pretty secure... Even with 4 years of work, I today still have 100% life left in them (SMART read out)
Yeah, endurance of Optane is pretty nuts. IIRC, the last gen DC drives had endurance of like 100 DWPD? They're really aimed at applications like a caching tier or journalling for a much larger flash tier.

It's hard for an end user to do anything with them which truly exploits their performance characteristics. For normal desktop usage, the vast majority of reads are serviced by the page cache or kernel read-ahead. Writes are again buffered by the kernel, unless an app goes out of its way to do a fsync(). So, other than cold boot times, it's hard to really "see" the benefits of Optane vs. a fast NAND-based NVMe drive.

Funny thing is that I had always aspired to own an Optane drive. When Intel announced it was being discontinued, I panicked and bought a 400 GB P5800X drive. I still haven't put it into service, however. I wonder if it might appreciate in price, once the supply finally dries up. If not, I'll probably go ahead and use it as a boot drive.

In the meantime, I bought an Intel/Solidigm P5520 to use for that, which I expect will be more than enough performance for me. It cost less than 1/3 the price and is nearly 10x the capacity. Both drives are PCIe 4.0 and the P5520's 1 DWPD endurance is more than enough for my use cases. Heh, when you do the math, 1 DWPD @ 10x the capacity is equivalent to doing 10 DWPD on the Optane drive!

A good I found place to buy these drives is Provantage (both Optane and Solidigm):
https://www.provantage.com/service/searchsvcs?QUERY=P5520
They don't currently stock all models of Solidigm drives, however. There are higher-endurance models I don't see on there, in case you ever need that. The one I bought is for mixed read/write workloads - they have a lower tier that's QLC-based, for read-oriented usage.

Tech0000 said:
Of course I was talking about AVX-512 units (FMA) that is what i wrote and there are 2 AVX-512 units per core so i total 36 AVX-512 units for a 10980XE.
See, it's the talk of "units" which threw me off. AVX-512 is more than just FMAs. What are either single or double is the number of FMA ports, depending on the Skylake/Cascade Lake model. In Ice Lake SP and beyond, Intel is always enabling both FMAs per core.

The Sunny Cove and Willow Cove cores found in the laptop Ice Lake and Tiger Lake SoCs physically have only one FMA per core. You can find die shot comparisons of the laptop & server Sunny Coves which show where the second FMA was bolted on. In contrast, the client Skylake cores physically lack any AVX-512 - again, confirmed by die-shot comparisons. Finally, the difference between client & server cores was carried through even to Golden Cove, where the client physically has a single AVX-512 FMA port, even though AVX-512 is disabled on them.

Tech0000 said:
yeah, what can i say. There are a lot of bad/unqualified programmers who do not have the necessary theoretical back ground or right incentive to design and write good systems. I recommend them (not you) to go back to school and take some good classes to further the knowledge.
I think people tend to be rather self-selecting. Either you're interested in milking the performance out of your hardware, and willing to do what it takes to accomplish that, or you're not. Ignorance is part of it, but these days the information is so readily available that ignorance primarily comes from lack of interest/curiosity/drive/etc.

On the bright side, we can think of it as job security. We're probably not far off from the age of a billion people on the planet with some idea about how to write code. Then, there's AI-driven code generation, as well. So, having detailed hardware knowledge and a good track record will be real selling-points, in the job market. I hope.

Tech0000 said:
my previous system before the 10980 was actually an Intel Xeon W3690 / x58 based system with 6 cores (Xeon) and I still use that machine for common tasks (not dev) and it is still running stable now for more than 12 years. Just upgrade the GPU every now and then. Those days you could put a Xeon W3690 into a X58 consumer board - amazing platform! OC Stability is like 100% for 12 years now. Honestly, you do not really need more that that for common every day tasks if you have a good GPU.
Yeah, I had one of those at work, for a long time. Just this year, the PSU probably blew a capacitor, because I smelled some smoke and noticed the machine was off and would no longer power on. I hadn't really been using it since the pandemic, but it made a decent Linux desktop. Years ago, I'd upgraded the graphics card to a GTX 1050 Ti, which drove my 4k monitor perfectly (for my needs, at least).

Tech0000 said:
I agree the W790 platform is an insult to enthusiast HEDT users like me and i think you. W2400 or W3400 series alike, they are not priced for the enthusiast market, but only for enterprise market. and even then, putting on my enterprise hat, I would not be able to justify the performance vs price per core ratio.
They are hoping you need AVX-512 and/or AMX. That's their main value-add for the lower end W2400 models. Maybe also higher memory capacities, since it supports RDIMMs. And I guess PCIe lanes, for people running multi-GPU setups.

Tech0000 said:
So i am hoping that Granite Rapids will give opportunity for intel to produce an enthusiast version. Sapphire and Emerald are out of reach. holding out 1 more year and see what comes...
In the next generation, Intel is going back to having 3 different sockets. However, even the smaller socket is going to be a similar size to their current one. So, I'm not really hopeful that they will ever go back to having a HEDT platform within my price range. I've made my peace with that, given how far the client platform has come.

I just wish the DMI connection had been upgraded to PCIe 5.0, because that would offer a lot more expandability. In contrast, I don't care at all that the x16 slot is PCIe 5.0. There are no PCIe 5.0 graphics cards, I don't need one, and I don't need a PCIe 5.0 SSD. The only place PCIe 5.0 would've been useful to me is the one place they didn't put it - the DMI connection!
Reply
bit_user

WallyJames said:
I think that Intel is great option for now, but for this time we see that Apple proccesor M2 pro is really powerfull,in 1 chip we have the videocard ,processor ,and a lot of features,in future we will se the gaming on apple laptops
The only way Apple belongs in the same sentence as Cascade Lake is in reference to their old Mac Pro, which was based on special SKUs of Cascade lake that supported 64 PCIe 3.0 lanes. And, because those topped out at 28 cores, their new Mac Pro could pretty easily beat it on multithreaded performance (it has only 24 cores, but 16 of them are significantly faster). However, where the new Mac Pro really falls down is on memory capacity & PCIe connectivity.

Apple has a long way to go, before they catch Sapphire Rapids or Genoa. I expect they're going to add a bunch of PCIe/CXL lanes, in the next generation, and then point to CXL.mem as a way to scale up memory capacity. And, if they scale up to 4 CPU tiles, then they could get themselves back in the core-count race. So, it's quite conceivable that Apple gets back in the high-end workstation game, but certainly not a given.
Reply
bit_user

emike09 said:
48 PCIe lanes was and is sufficient for me, for now. I can't imagine going back to 20 or 24 found on consumer platforms. I'd actually consider something like the i9-13900K if it had more lanes. AFAIK, nothing I use utilizes AVX-512, so I don't feel like I'm missing out much there.
Although I wish they went to PCIe 5.0, I would point out that the DMI connection is x8 PCIe 4.0. So, there's the equivalent of PCIe 3.0 x16 in bandwidth that the chipset can fan out. That helps, IMO.
Reply

Show more comments