Intel releases new software for GPU-powered Project Battlematrix workstations — Arc Pro B-series GPUs get LLM Scaler 1.0 software to optimize performance in AI workloads
Project Battlematrix is almost ready for deployment

Intel has introduced its first update for the software side of Project Battlematrix. The company's new inference-optimized software stack optimizes AI workload orchestration on the company's Arc Pro B-series GPUs in multi-GPU workstations. The suite includes a Linux-based LLM Scaler for AI inference workflows.
Project Battlematrix is an Intel AI-focused initiative, designed to provide highly capable Intel-powered AI workstations for the market. The project combines Intel hardware and software to create a cohesive workstation solution involving multiple Arc Pro B-series GPUs in a single system. Project Battlematrix workstations will come with Xeon CPUs, up to eight GPUs, and up to 192GB of total VRAM, with pricing for 'Project Battlematrix' AI workstations ranging from $5,000 to $10,000.
Powering these systems is the Arc Pro B60, the workstation counterpart to Intel's Arc B580 with more memory and PCIe 5.0 support. The Pro B60 packs 20 Xe Cores, 24GB of GDDR6 memory, 160 XMX engines, PCIe 5.0 support, multi-GPU support, and a variable TDP (ranging from 120 to 200 watts).
Supporting Project Battlematrix workstations is a validated full-stack containerized Linux solution, which will include everything needed to get the servers up and running quickly and easily. The LLM Scaler is just one of several containers Intel is developing for its full-stack containerized Linux solution.
The LLM Scaler Release 1.0 on GitHub focuses on "early customer enablement" and includes optimizations for several AI model types, as well as added feature support, including speculative decoding and torch.compile. Ten optimizations and features have been incorporated into release 1.0.
Long input lengths of TPOP are up to 1.8x faster for 40K sequential lengths on 32 billion KPI models, and up to 4.2x faster for 40K sequential lengths on 70 billion KPI models. A 10% output throughput performance improvement has been added for eight billion to 32 billion KPI models.
By-layer online quantization has been added to reduce GPU memory requirements for LLMs. Support for embedding and rerank model, enhanced multi-model model support, maximum length auto-detecting, and data parallelism has been added, in addition to the aforementioned speculative decoding and torch.compile.
Intel has also added OneCLL benchmark tool enablement into release 1.0, as well as an XPU manager featuring firmware update functionality, GPU power and memory bandwidth monitoring, and GPU diagnostic capabilities.
Intel also announced that a hardened version of its LLM Scaler with even more functionality will be released by the end of Q3, while a full feature set release is scheduled for Q4.
The LLM Scaler has technically been released ahead of schedule; Intel previously promised that its first container deployments would be coming in Q3, not Q2. Still, Intel's developers are far from finished. Intel has promised further functionality, including SRIOV, VDI, and manageability software deployment, which will be coming in Q4.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.
-
cyrusfox will be interested to see the benchmarks that come out of here in terms of tokens per second and whether the full memory bandwidth can be utilized to load accurate models. Winning on pricing for the memory, depends on performance metric and how quick the software can come to market.Reply