Intel's Arc Alchemist GPUs can run large language models like Llama 2, thanks to the company's PyTorch extension, as demoed in a recent blog post. The Intel PyTorch Extension, which works on both Windows and Linux, allows LLMs to take advantage of the FP16 performance on Arc GPUs. However, given that Intel says you'll need 14GB of VRAM to use Llama 2 on Intel hardware, it means you'll probably want an Arc A770 16GB card.

PyTorch is an open-source framework, developed by Meta, for machine learning that can then be used to work on LLMs. While this software works out of the box, it's not coded by default to take full advantage of every piece of hardware, which is why Intel has its PyTorch extension. This software is designed to take advantage of the XMX cores inside Arc GPUs, and saw its first release in January 2023. Similarly, AMD and Nvidia both have optimizations for PyTorch for optimization purposes.

In its blog post, Intel demonstrates the performance capabilities of the Arc A770 16GB in Llama 2 using the latest update to Intel's PyTorch extension, which came out in December and specifically optimized FP16 performance. FP16, or half-precision floating point data, exchanges precision for performance, which is often a good tradeoff for AI workloads.

The demo shows Llama 2 and the dialogue-focused Llama 2-Chat LLMs, asking questions like "can deep learning have such generalization ability like humans do?" In response, the LLM was surprisingly humble and said deep learning wasn't on the same level as human intelligence. However, in order to run LLMs like Llama 2 with FP16 precision, you'll need 14GB of VRAM according to Intel, and we also didn't get any numbers on how quickly it responded to inputs and queries.

While this demo only showcases FP16 performance, Arc Alchemist also has BF16, INT8, INT4, and INT2 capabilities. Of these other data formats, BF16, is of particular note, as it's often considered to be even better for AI workloads thanks to its wider numerical range, which is on par with FP32 at eight bits while FP16 just has five. Optimizing BF16 performance could be high up on Intel's list for its next PyTorch extension update.