Intel's Sapphire Rapids Had 500 Bugs, Launch Window Moves Further

Sapphire Rapids
(Image credit: Intel)

Intel has delayed the release of its 4th Generation Xeon Scalable "Sapphire Rapids" processor for a number of times without disclosing its reasoning. Last week the company admitted that it had to change up Sapphire Rapids because of a security bug, but it appears that the problem is bigger than Intel says. According to Igor's Lab, Sapphire Rapids had about 500 bugs that required the company 12 steppings to fix them. 

Intel's fouth Gen Xeon Scalable Sapphire Rapids' processor will not only increase core count to up to 60, but will bring in numerous new features, including Advanced Matrix Extensions (AMX), Data Streaming Accelerator (DSA), CXL 1.1 protocol, DDR5 and HBM2E memory support, PCIe Gen 5 interface, and many more. But the host of additional features increase probability of hardware bugs, so Intel had to fix almost 500 of them, Igor's Lab reports. 

So far, Intel has released A0, A1, B0, C0, C1, C2, D0, E0, E2, E3, E4 and E5 steppings of Sapphire Rapids processor to fix nearly 500 bugs. Given that modern processors integrate tens of billions of transistors, it is inevitable that have a certain number of bugs. They are called erratas and are mitigated with microcode or even software updates. But 500 erratas seems overwhelming, as does 12 respins considering that a respin costs tens of millions of dollars. 

Although it is expensive to build new respins, the more pressing issue is that Intel has to delay release of its next-generation datacenter CPUs. Right now, Intel targets 2023 calendar week 6 to 9 (Feb. 6, 2023 to March 3, 2023) launch window for high-volume Sapphire Rapids processors. Meanwhile, some SPR products may launch on 2022 calendar week 42 and 2022 calendar week 45. 

For Intel, the Sapphire Rapids processor and the Eagle Stream platform are crucially important products. Not only they are expected to improve Intel's competitive positions on the datacenter market, but they will open doors to the company's following generation products — the codenamed Emerald Rapids processor due in 2023.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • dehjomz
    Given that Sapphire Rapids has Golden Cove cores. just like Alder Lake also has Golden Cove cores... if there are indeed 500 (known) bugs in the Sapphire Silicon, are any of those bugs present in Alder Lake's version of Golden Cove?

    Any way to test to see whether the bugs in Sapphire Rapids are indeed present in Alder Lake and the upcoming Raptor Lake?

    Sapphire's volume ramp has been anything but Rapid.
    Reply
  • Alvar "Miles" Udell
    Well it sounds catastrophic, but when you're talking server space where systems are measured in millions of dollars, squashing as many bugs pre-release is a far better option than releasing a product and then face having to possibly remove performance to mitigate.
    Reply
  • -Fran-
    But hey, it's on track because Intel hasn't officially said anything yet, right?

    Oh wait...

    Regards :D
    Reply
  • Samipini
    dehjomz said:
    Given that Sapphire Rapids has Golden Cove cores. just like Alder Lake also has Golden Cove cores... if there are indeed 500 (known) bugs in the Sapphire Silicon, are any of those bugs present in Alder Lake's version of Golden Cove?

    Any way to test to see whether the bugs in Sapphire Rapids are indeed present in Alder Lake and the upcoming Raptor Lake?

    Sapphire's volume ramp has been anything but Rapid.
    Bugs may be for features specific to Data centers: Optane dimms, ECC, and some aceelerators that make good selling points. Consumers don't care about them.
    Reply
  • Samipini
    Alvar Miles Udell said:
    Well it sounds catastrophic, but when you're talking server space where systems are measured in millions of dollars, squashing as many bugs pre-release is a far better option than releasing a product and then face having to possibly remove performance to mitigate.
    So is delaying your product for years. Intel lost so many customers as they waited endlessly for something new.
    Reply
  • escksu
    dehjomz said:
    Given that Sapphire Rapids has Golden Cove cores. just like Alder Lake also has Golden Cove cores... if there are indeed 500 (known) bugs in the Sapphire Silicon, are any of those bugs present in Alder Lake's version of Golden Cove?

    Any way to test to see whether the bugs in Sapphire Rapids are indeed present in Alder Lake and the upcoming Raptor Lake?

    Sapphire's volume ramp has been anything but Rapid.

    Definitely. Bugs exist for all CPUs out there, whether past, present or even future. No cpu is perfect. Bugs exit in GPUs too.

    You just need to download their technical documents to know which are the bugs. Both Intel and AMD has such documents available on their websites. They are commonly referred to as errata.

    Some could be corrected via different 'steppings', some via microcode (bios). Some could not be corrected so programmers must avoid them. Sometimes, workaround is also available in these documents.
    Reply
  • escksu
    Samipini said:
    So is delaying your product for years. Intel lost so many customers as they waited endlessly for something new.

    Not really. Most of these customers are corporates instead of end-users. They don't wait. They simply buy what's available in the market. Only very niche products like supercomputers will need to wait because they are designed around the processor.

    Also, sapphire rapids are just "optimized" processors that integrate many features into a single processor. Current CPUs/GPUs combo can also do perform the same tasks, just not as efficient.
    Reply
  • kjfatl
    For a project of this size, 500 'bugs' is no surprise. There is a good chance that 80% of the bugs are performance or manufacturing related. Most of the bugs are probably related to new logic and instructions. Of course, even a single bug in the wrong place can be a huge disaster.
    Reply
  • bit_user
    This comes at a horrible time for Intel, because their current product (Ice Lake SP) was barely competitive at launch (which was years late) and has no chance against AMD's Genoa.

    dehjomz said:
    Given that Sapphire Rapids has Golden Cove cores. just like Alder Lake also has Golden Cove cores... if there are indeed 500 (known) bugs in the Sapphire Silicon, are any of those bugs present in Alder Lake's version of Golden Cove?
    Precisely because Golden Cove is already on the market, I think we can say most of the show-stopper bugs aren't in the cores, themselves. And the ones in the cores would be related to features disabled in consumer versions, anyhow (like AVX-512, perhaps).

    dehjomz said:
    Sapphire's volume ramp has been anything but Rapid.
    🤣I like it! Intel really opened themselves up to that one!

    Samipini said:
    Bugs may be for features specific to Data centers: Optane dimms, ECC, and some aceelerators that make good selling points. Consumers don't care about them.
    Agreed... except not really about ECC, because Alder Lake supports ECC when used in a motherboard with a WS680 chipset. That said, servers typically feature advanced ECC modes you don't get in entry-level workstations.

    As for datacenter features, the article correctly notes these CPUs introduce AMX, DSA, and CXL. These are each big, new, and complex features. To pick on AMX, Intel's software support for it was so late that I could imagine they were delayed in getting their hardware validation tests in place for the initial steppings of the CPU.

    CXL is probably not easy to test, due to the lack of CXL devices on the market.

    escksu said:
    Not really. Most of these customers are corporates instead of end-users. They don't wait. They simply buy what's available in the market.
    I don't know about that. "corporates" is a very broad category. Many business customers will have some leeway in deciding when to decommission and replace old machines.

    escksu said:
    Also, sapphire rapids are just "optimized" processors that integrate many features into a single processor. Current CPUs/GPUs combo can also do perform the same tasks, just not as efficient.
    Uh, that's probably over-simplifying it. They actually lack some features of the consumer CPUs, such as the iGPU, media block, display controller, GNA, and E-cores. On the other hand, they have AVX-512 (i.e. supported, tested, and working - they said AVX-512 in Alder Lake hadn't been tested, even though it seemed to work with some BIOS) and certain RAS (Reliability, Availability, and Serviceability) features, a different interconnect for enabling cores to communicate with each other, different last-level cache, inter-processor communications link, plus the DSA (Data Stream Accerelator) and AMX (Advanced Matrix eXtensions) that I already mentioned. While there's some functional overlap between AMX and GPUs, the way AMX works is very different. DSA is a programmable engine for shipping and manipulating data streams, which does in hardware things that a consumer CPU would have to do in software.

    In short, these are a different animal. They're not simply a gluing together of CPUs and GPUs, but rather a complex system of richly-featured cores and special-function units that need to interact in complex ways to satisfy all the needs and demands of modern server users. And that might be their undoing - trying to build one CPU architecture that can be everything to everyone.
    Reply
  • hannibal
    Every product has bugs. It is just how hard it is to "fix" them with firmware updates and how serious those bugs are. But there has not been single CPU release by any company that do not have hardware bugs.
    Reply