Intel releases new software for GPU-powered Project Battlematrix workstations — Arc Pro B-series GPUs get LLM Scaler 1.0 software to optimize performance in AI workloads

(Image credit: Tom's Hardware)

Intel has introduced its first update for the software side of Project Battlematrix. The company's new inference-optimized software stack optimizes AI workload orchestration on the company's Arc Pro B-series GPUs in multi-GPU workstations. The suite includes a Linux-based LLM Scaler for AI inference workflows.

Project Battlematrix is an Intel AI-focused initiative, designed to provide highly capable Intel-powered AI workstations for the market. The project combines Intel hardware and software to create a cohesive workstation solution involving multiple Arc Pro B-series GPUs in a single system. Project Battlematrix workstations will come with Xeon CPUs, up to eight GPUs, and up to 192GB of total VRAM, with pricing for 'Project Battlematrix' AI workstations ranging from $5,000 to $10,000.

Powering these systems is the Arc Pro B60, the workstation counterpart to Intel's Arc B580 with more memory and PCIe 5.0 support. The Pro B60 packs 20 Xe Cores, 24GB of GDDR6 memory, 160 XMX engines, PCIe 5.0 support, multi-GPU support, and a variable TDP (ranging from 120 to 200 watts).

Supporting Project Battlematrix workstations is a validated full-stack containerized Linux solution, which will include everything needed to get the servers up and running quickly and easily. The LLM Scaler is just one of several containers Intel is developing for its full-stack containerized Linux solution.

By-layer online quantization has been added to reduce GPU memory requirements for LLMs. Support for embedding and rerank model, enhanced multi-model model support, maximum length auto-detecting, and data parallelism has been added, in addition to the aforementioned speculative decoding and torch.compile.

Intel has also added OneCLL benchmark tool enablement into release 1.0, as well as an XPU manager featuring firmware update functionality, GPU power and memory bandwidth monitoring, and GPU diagnostic capabilities.

Intel also announced that a hardened version of its LLM Scaler with even more functionality will be released by the end of Q3, while a full feature set release is scheduled for Q4.

The LLM Scaler has technically been released ahead of schedule; Intel previously promised that its first container deployments would be coming in Q3, not Q2. Still, Intel's developers are far from finished. Intel has promised further functionality, including SRIOV, VDI, and manageability software deployment, which will be coming in Q4.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

3 Comments Comment from the forums

cyrusfox

will be interested to see the benchmarks that come out of here in terms of tokens per second and whether the full memory bandwidth can be utilized to load accurate models. Winning on pricing for the memory, depends on performance metric and how quick the software can come to market.
Reply
gg83

Im noticing in medicine and tech that there's really only specialists without anyone person or company that can do it all. Intel used to dominate fab and design, doctors used to diagnose and treat patients. Now the fabless are beating Intel. When hospitals loose a specialist, neuro doctor for instance, the hospital can no longer offer those types of services. Philips used to do everything and same with IBM, now they are very specialized. I don't even know what Philips actually still owns, maybe the medical division.
Anyway, I really want Intel to get back to top tier design and manufacture.
Reply
abufrejoval

From what little I know about making GPU scale-out work generically for LLMs, this project was a fool's errand from the very start.

But of course, it's one of those projects, where I'd love to be proven wrong, because if the approach were to actually work, you could generalize that with more popular gamer dGPUs: no Intel hardware required.

But now those Intel software engineers will never get a chance to even prove the point, since they were likely dismissed already...

Intel's dGPU stories are hard to read, because my head just automatically starts swiveling in a "no, no, no!" motion as soon as I see "Intel" and "GPU" together...

And this "product" builds on a base that is already the worst in the market: just how did they come to believe that stacking them would improve the result? Perhaps, whoever blocked the project back then, got fired first?

They could have created a bit of a stir, if they had brought out big (& cheap) RAM dGPUs at Battlemage launch a year ago. But as a last hooray? (head starts to swivel uncontrollably again...)
Reply