Intel's Project Endgame Cloud-based GPU is On Hold

Intel Project Endgame roadmap
(Image credit: Intel)

Intel's project Endgame, a network-based solution that aimed to bring the processing power of cloud based Intel ARC GPUs to local systems, seems to be on an indefinite hold. Announced by Intel through Twitter, the news means that the timeline for users looking to increase their available GPU grunt through the cloud has become even more blurry than it already was. Hat-tip to Bionic_Squash.

Project Endgame was meant to be available in Beta form by the end of 2022, but that time has come and gone and Intel's accelerated computing dreams seem to have faded away. Unfortunately, the tweet from Intel didn't provide any updates on the how or the why of Project Endgame's apparent hold.

Project Endgame was announced by Intel back in 2022, at a time when Raja Koduri still served as Intel’s Senior Vice President and General Manager of Intel Graphics group (Raja himself having been poached by Intel from a similar position at AMD). At the time, Raja showcased the potential of the service by piloting a low-power laptop running Epic Games’ impressive "Matrix Awakens" demo.

With that particular demo's performance demands, however, frame-rates on the low-power local computer were low and choppy; a situation rapidly corrected by activating Endgame and its "continuous compute" option, which enabled Raja's laptop to leverage a network-connected ARC GPU to accelerate the workload.

While Intel didn't confirm the scenarios where Endgame would be a feasible way to access more performance, it's believed that the company wanted to extend it from general computing to even Edge Computing and IoT - allowing even low-power, remote installations to process graphics and compute workloads (including gaming, generative AI, video encoding, and similar) so long as their network had the necessary bandwidth.

Endgame could be a truly invisible performance boost option; unlike services like GeForce Now, which generate a cloud-based OS instance where all the processing happens, Endgame would allow users to bring ARC's performance to their own installations. The project's cryptic delay is unfortunate, but considering its potential, it's likely that Intel will eventually push forward with the project - whether it remains an Endgame or not.

Francisco Pires
Freelance News Writer

Francisco Pires is a freelance news writer for Tom's Hardware with a soft side for quantum computing.

  • bit_user
    The idea that it splits work between the client machine and the cloud just seems so weird and wrong. We know you need at least PCIe 3.0 x16 for good performance on a high-end dGPU, so putting the GPU in the cloud would mean cutting back from 16 GB/s to like 0.13 GB/s over a gigabit internet connection (if you even get that much throughput, which you usually don't). Maybe you could use some tricks to cut it down by one order of magnitude, but certainly not two. So, I remain skeptical that's really how it's meant to work.

    Instead, I wonder if it's more like having your game running in a container and transparently migrating that container to a cloud instance. The cloud would need to have a cached copy of the game's disk image, too. Somehow, that doesn't seem terribly workable, either.

    In UNIX (X11, to be precise), you can actually run OpenGL programs on a remote machine and have the graphics rendered by your local GPU, which ties in with the idea of forwarding the rendering commands. Except, I think that stopped working after like OpenGL 1.x. As I was alluding to above, complex geometry and big textures would tank performance rather quickly.

    Finally, with this just getting cancelled now, I have to wonder just how many resources they wasted on it, that they could've put towards improving their beleaguered drivers.
    Reply
  • -Fran-
    The copium on some people will show rather soon.

    Regards.
    Reply
  • bit_user
    -Fran- said:
    The copium on some people will show rather soon.
    What do you mean?
    Reply
  • dalek1234
    "on hold" is Intel's euphemism for "cancelled, just like all the other cancellation we've been doing"
    Reply
  • derekullo
    Google Stadia went so well!
    It seems Intel also wants to throw some money into the cash incinerator!

    It could work you just need to lower the bar ... way down.
    For comparison AGP 2.0 shows a max speed of 1066 megabytes a second.
    So with a 10 gigabit connection (1250 megabytes a second) you could theoretically recreate the bandwidth of a gpu from the late 2000s!
    Reply
  • InvalidError
    bit_user said:
    The idea that it splits work between the client machine and the cloud just seems so weird and wrong. We know you need at least PCIe 3.0 x16 for good performance on a high-end dGPU
    How much of it is actually used though? My RX6600 doesn't appear to have a PCIe activity% stat in its SMB status. IIRC, on my GTX1050, this was under 3% most of the time while playing games. If you had driver-level integration to parse APIs on the client side and cache stuff on the server side to eliminate unnecessary chatter between the two and hide latency from software, I can imagine the traffic getting reduced even further.

    This could be great if it allowed a PC with an A750+ to share its GPU with other devices on the LAN. If Intel wants this to be hosted in datacenters though, the roundtrip latency is going to kill it for most of the market just like it has most other similar services.
    Reply
  • bit_user
    InvalidError said:
    How much of it is actually used though?
    Okay, fair point. I don't actually know how much of PCIe speed is important due to latency or for actual throughput.

    Regarding throughput, one aspect I didn't really delve into is that most internet connections are asymmetric. With cable internet, I think it's not uncommon to have 10:1 download:upload ratio. That might be alright if you're just uploading events from an input device, like mouse, joystick, etc. (i.e. the Google Stadia model), but not if you're uploading textures and geometry that need to get rendered, compressed, and turned around within tens of milliseconds.
    Reply
  • InvalidError
    bit_user said:
    Regarding throughput, one aspect I didn't really delve into is that most internet connections are asymmetric. With cable internet, I think it's not uncommon to have 10:1 download:upload ratio. That might be alright if you're just uploading events from an input device, like mouse, joystick, etc. (i.e. the Google Stadia model), but not if you're uploading textures and geometry that need to get rendered, compressed, and turned around within tens of milliseconds.
    Most textures and other assets are static. The local virtual GPU driver can hash whatever the game is pushing through the API, send the hash to the server, the server queries its local asset cache to see if the asset already exists and loads it from local storage at 10+GB/s or SAN at 10-25Gbps. Only the first player to encounter a new asset would get the full new asset upload hit. The local driver can also cache resource hashes to know which assets it doesn't need to wait for server confirmations on.

    Though I bet this sort of advanced caching would trigger some copyright litigation, even though the assets are only being used in conjunction with the software they originally came from for their originally intended purpose.
    Reply
  • cryoburner
    bit_user said:
    Regarding throughput, one aspect I didn't really delve into is that most internet connections are asymmetric. With cable internet, I think it's not uncommon to have 10:1 download:upload ratio. That might be alright if you're just uploading events from an input device, like mouse, joystick, etc. (i.e. the Google Stadia model), but not if you're uploading textures and geometry that need to get rendered, compressed, and turned around within tens of milliseconds.
    The description of what this service was intended to be is rather vague, though I question whether a game would be running on the local system, and sending graphics calls to a remote server. It wouldn't make much sense to rely on a low-end device's limited CPU and RAM capabilities while using the cloud-based server strictly for its graphics hardware, when it would undoubtedly produce better results simply having everything run on the server and sending back a compressed video feed, much like other game streaming services. There would likely be way too much latency for something like that to produce usable results, and the other limitations of the local device would greatly limit what you could run on it.

    Perhaps it could work better if the GPU were installed in a device on the local network, and used more like an external GPU shared between devices on the network. But I don't think that's what this was, considering Intel had slides describing Project Endgame as "XPU Compute as a Service".

    It's possible that this service wasn't even intended for gaming. Just because they showed off a GPU rendering a graphics tech demo remotely doesn't mean video games were the intended use case. They only described it as an "always-accessible, low-latency computing experience", while it seemed to be tech media suggesting it might be for running games.
    Reply
  • InvalidError
    cryoburner said:
    It wouldn't make much sense to rely on a low-end device's limited CPU and RAM capabilities while using the cloud-based server strictly for its graphics hardware, when it would undoubtedly produce better results simply having everything run on the server and sending back a compressed video feed, much like other game streaming services.
    I can imagine at least one scenario where it would make sense: a game console or similar device that allows you to use another more powerful device for graphics rendering when available to drive a higher resolution output at enhanced graphics settings, preferably to output on an attached display instead of return as a stream.

    Though I suspect the sort of latency sensitivity and bandwidth this entails would only have a shot at viability on a GbE LAN.
    Reply