Intel Stratix 10 DX Introduces PCIe 4.0, Cache-Coherency via UPI

(Image credit: Intel)

Intel today announced it has begun shipping a new product series in its 14nm line-up of FPGAs, the Stratix 10 DX. It brings PCIe 4.0 to the Stratix 10 series and Intel’s product portfolio in general, and also supports Optane DC Persistent Memory connectivity and cache-coherency. The latter feature is enabled via Intel’s Ultra Path Interconnect (UPI). In this way, Intel hopes to accelerate the development of coherent workloads ahead of the availability of Compute eXpress Link (CXL) in 2021 with Agilex.

When Intel started pursuing its data-centric strategy a couple of years ago as a reaction to the data explosion, it saw that CPUs alone wouldn’t suffice to satisfy the diverse workloads in the cloud, the network and at the edge. FPGAs, with their low latency and high bandwidth capabilities, became one such accelerator in Intel’s heterogeneous strategy, for example for offloading certain tasks to free up CPU resources. Within Intel’s FPGA business, the company has a number of programmable acceleration cards (PACs) from edge to cloud, such as its second-generation PAC based on Stratix 10, the N3000 PAC for 5G networking it announced at MWC this year, and an Arria 10 card for AI inference with OpenVINO.

(Image credit: Intel)

Intel today is introducing the Stratix 10 DX as a coherent FPGA. In this new model, the Xeon processor and Stratix 10 DX FPGA both have access to a coherent system memory pool. This pool can consist of DDR memory as well as persistent memory via Optane DC Persistent Memory and Optane DC SSDs. This effectively creates a new memory tiering with DRAM up to Optane SSDs all available to the CPU and FPGA accelerator, and 3D NAND and HDDs serving as storage.

For interfacing with the Optane DIMMs, the Stratix 10 DX has a new optimized FPGA memory controller. This memory controller supports up to eight Optane DIMMs per FPGA, good for up to 4TB of non-volatile memory. This is actually a soft IP memory controller and will be available in a future release of Quartus Prime.

Moreover, there is not just a coherent memory pool to which both the CPU and FPGA have access to, the FPGA is also coherently connected to the Xeon processor via a UPI link, which is Intel’s low-latency coherent interconnect for multiprocessor systems. Stratix 10 DX devices provide up to three UPI ports, the same amount as a Xeon Scalable CPU.

This is not the first time Intel has talked about a coherent attach of an FPGA to the Xeon processor, as this was touted as one of key features when Agilex was introduced in April, supported in the I- and M-series. At the time, it was said that Agilex was the first FPGA to feature this coherent attach and that it would leverage the newly announced CXL link based on the PCIe 5.0 physical layer. With today’s announcement, Intel is bringing some of those capabilities to Stratix 10 via UPI. (We are not sure what happened to the Xeon Scalable processor with integrated Arria 10 FPGA that was announced last year and was also said to feature a cache-coherent interface via UPI.)

Of course, leveraging UPI as a stopgap until CXL raises some concerns as to why someone would invest in a dead-end ecosystem that will soon be replaced by an open standard. After all, it is not clear if Agilex will even support UPI. To address this, Intel says there will be a migration path from UPI to CXL that will be akin to a port. Although some re-coding might be required, Intel claims this should (largely) preserve companies’ R&D investments.

The benefit of this is that the Stratix 10 DX should speed up the ecosystem development for coherent workloads, as Intel reiterated that the first CXL devices would be available in 2021. (As part of the Sapphire Rapids platform, per a leaked roadmap from earlier this year). So Intel’s approach here, in essence, is to promote its coherent Xeon accelerator roadmap with a “start today with UPI, seamlessly move to CXL in the next generation” pitch. Intel isn’t entirely clear on where UPI is supported, though, but it is a “future select Xeon Scalable Processor.” This should refer to next year’s Cooper Lake-SP and/or Ice Lake-SP.

The third new feature of Stratix 10 DX is non-coherent PCIe 4.0 x16 support. It is the first product that Intel has announced with such support. The interface is fully certified (PCI-SIG compliant), which Intel in not so many words said is unlike Xilinx’ Versal series. However, PCIe Gen4 support raises the question how this will be leveraged because Intel does not have any CPUs yet that support it. Intel notes that the interface is aligned to “future select Intel Xeon Scalable Processors.” According to the same roadmap above that contained Sapphire Rapids, the upcoming Cooper Lake-SP will be limited to PCIe Gen3, indicating that Ice Lake-SP is the target platform for PCIe 4.0.

(Image credit: Intel)

In terms of performance, Intel claims UPI offers 37% lower latency compared to a PCIe roundtrip in a read transaction. PCIe 4.0 delivers 2x higher bandwidth. The UPI interface has a theoretical peak transfer rate of 28 GB/s via 20 lanes of 11.2Gbps. When UPI is also taken into account, this means Stratix 10 DX has 2.6x higher total bandwidth.

These improvements have multiple benefits. Lower latency is important at the edge where real-time latency is required. Higher bandwidth allows support for up to 400Gbps Ethernet. Intel claims the coherent attachment ensures better multi-node compute efficiency. Lastly, the memory expansion via Optane persistent memory enables larger datasets at the edge and in the data center. Other existing Stratix 10 capabilities also remain available, such as up to 2.7M logic elements (LEs), up to 8GB HBM2 memory, up to 58G transceivers and a quad-core ARM Cortex-A53 subsystem, making it the most advanced FPGA in the Stratix 10 family.

(Image credit: Intel)

To place this in the context of the Stratix 10 architecture, the Stratix 10 DX exists by virtue of a fifth chiplet that has been added to the ecosystem, the P-Tile that incorporates the PCIe Gen4 and UPI interface blocks. The three devices in the family further all have an E-Tile for 58G transceiver support. The ARM subsystem and HBM2 memory aren’t available simultaneously in any of the three configurations, though.

(Image credit: Intel)

Use cases for the FPGA include network acceleration (smartNIC), data center disaggregation (memory expansion) and compute acceleration. Intel and VMWare disclosed that they are collaborating to develop coherent FPGA and CPU solutions. CESNET said that the Stratix 10 DX would double the throughput of its FPGA-based smartNIC by leveraging PCIe 4.0.

The Stratix 10 DX is shipping for early access, and Intel says that design tools, software and development boards for the new FPGA are all available now. As we noted though, both the UPI cache coherent interconnect and PCIe 4.0 features are not available yet in the current Cascade Lake generation of Xeon Scalable processors, but this might not be a problem as Cooper Lake-SP, and Ice Lake-SP are also sampling. Volume production is slated for 2020.

  • JayNor
    nice article.
    I believe the intel/mobileye eyeq5 chip also has pcie4 interface. Perhaps it was their first sampled chip with that.
    https://www.mobileye.com/our-technology/evolution-eyeq-chip/
    It isn't clear to me how this UPI is different from UPI already used for the multi-socket server boards. Also, one of the other articles I read said the UPI is available now.

    The ice lake server chip is already sampling, and includes pcie4.
    Reply
  • bit_user
    PCIe Gen4 support raises the question how this will be leveraged because Intel does not have any CPUs yet that support it
    I'll bet AMD and IBM will sell a few more CPUs to host these, in the meantime. As long as you don't need Optane memory and just want to use the PCIe card form factor (i.e. no UPI), Epyc is now the go-to for PCIe.

    the ARM subsystem and HBM2 memory aren’t available simultaneously in any of the three configurations
    So, it's a tile? Or just conflicts in other ways? I was thinking it must annoy Intel to see those ARM cores in there, but their customers probably wouldn't have it if Intel tried to swap them for, say, Tremont cores. However, if they're demoted to a tile, that would let Intel start edging them out.
    Reply
  • bit_user
    JayNor said:
    I believe the intel/mobileye eyeq5 chip also has pcie4 interface. Perhaps it was their first sampled chip with that.
    https://www.mobileye.com/our-technology/evolution-eyeq-chip/
    Well, that is an embedded SoC, and they note that PCIe can be used to link multiple EyeQ5's. So, it lacks the incongruity of a PCIe 4-enabled server add-in.
    EyeQ®5 implements two PCIe Gen4 ports for inter-processor communication, which could enable system expansion with multiple EyeQ®5 devices or for connectivity with an application processor.
    Also, the EyeQ5 is slated to for a 2020 launch, according to that.

    JayNor said:
    It isn't clear to me how this UPI is different from UPI already used for the multi-socket server boards.
    But that's the point - it is the same. The point is that you could put one (or more) of these FPGAs on a server board and link it to the CPU(s) via UPI. Furthermore, that's the only way to make it cache-coherent, in this generation.
    Reply
  • JayNor
    eyeq5 block diagram shows pcie4 being used for peripherals, as well as a cache coherent interface. Eyeq5 has been in OEM partners' hands since Dec 2018.

    https://www.eetimes.com/mobileyes-new-eyeq5-how-open-is-open/#
    Reply