AMD fires back at Nvidia and details how to run a local AI chatbot on Radeon and Ryzen — recommends using a third-party app

(Image credit: AMD)

With Nvidia and Intel having revealed their locally run AI chatbots recently, it seems AMD doesn't want to be left out and has also published its own solution for owners of Ryzen and Radeon processors. In five or six steps, users can start interacting with an AI chatbot that runs on their local hardware rather than in the cloud — no coding experience required.

AMD's guide requires users to have either a Ryzen AI PC chip or an RX 7000-series GPU. Today, Ryzen AI is only available on higher-end Ryzen APUs based on Phoenix and Hawk Point with Radeon 780M or 760M integrated graphics. That suggests that while the Ryzen 5 8600G is supported, the Ryzen 5 8500G may not work... except the application itself only lists "CPU with AVX2 support" as a requirement, which means it should work (perhaps very slowly) on a wide range of processors.

Users need to download and install LM Studio, which has a ROCm version for RX 7000-series users — note again that the standard package also works with Intel CPUs and Nvidia GPUs. After installing and launching LM Studio, just search for the desired LLM, such as the chat-optimized Llama 2 7B. AMD recommends using models with the "Q4 K M" label, which refers to a specific level of quantization (4-bit) and other characteristics. While Ryzen CPU users are free to chat it up with the bot at this point — and it's not clear whether the NPU is even utilized, but we'd guess it's not — RX 7000-series GPU users will need to open up the right side panel and manually enable GPU offloading and drag the offload slider completely to "max."

AMD's tutorial means that there's at least one easy to use official method to run AI chatbots on all consumer hardware from AMD, Intel, and Nvidia. Unsurprisingly, Nvidia was first with its Chat with RTX app, which naturally only runs on Nvidia GPUs. Chat with RTX is arguably the most fleshed-out of the solutions, as it can analyze documents, videos, and other files. Plus, support for this Nvidia chatbot stretches back to the 30-series, and 20-series support may be on the table.

Meanwhile, Intel's AI CPU/NPU and GPU solutions are more in the weeds. Instead of using an app to showcase a local AI chatbot, Intel demonstrated how you can use Python to code one. While the code users will have to write isn't exactly long, having any coding involved at all is going to be a barrier for a lot of potential users. Additionally, chat responses are displayed in the command line, which doesn't exactly scream "cutting-edge AI." You could try LM Studio instead, though it doesn't appear to have Intel GPU or NPU support yet, so it would just use your CPU.

While AMD doesn't have its own AI chatbot app like Nvidia, it seems further along than Intel in respect to features, since there's at least ROCm GPU hardware support. The next step for AMD is probably to make its own version of Chat with RTX, or at least to work with the developers of LM Studio to enable more features for AMD hardware. Perhaps we'll even see AI functionality integrated into the Radeon Adrenalin driver suite — AMD does make driver-level AI optimizations, and the driver suite often receives new features like Fluid Motion Frames.

TOPICS

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.

7 Comments Comment from the forums

shalako33

Admin said:
AMD posted a tutorial detailing how users can get an AI chatbot running on Ryzen CPUs and Radeon GPUs. It's surprisingly straightforward.

AMD fires back at Nvidia with instructions on running a local AI chatbot — recommends using a third-party app : R

Reply
shalako33

I've had this installed and setup for months but haven't used it yet. I've been meaning to eventually mess around with it trying out different LLMs. I even downloaded the 7 billion and maybe a couple of smaller ones a while back for when i do. Its just i've got a bunch of games keeping me busy at the minute. Think i read somewhere there's a 12 billion or higher on the way.
Reply
kealii123

I wish someone could be clear on how much ram can be provisioned to the 780m iGPU (since the NPU doesn't seem to be used). I sold my 6800u device so I can't test myself. On my Intel laptop (i7 10th gen, 2080 super, 64 gig ram) I can run very large models, thanks to the 64 gigs of RAM, but is glacially slow since the CPU is doing the work all by itself.

On my M1 Macbook Pro, 32 gigs RAM, I can run the 13b param llama2 (Olamma) if I don't have too many chromium tabs open, and its faster than GPT4 since the M1 GPU is running the model, and it has access to as much of the 32gigs of shared system ram as it wants.

If the AMD chips with their decent iGPU can be set in the BIOS or somewhere to have access to an arbitrary share of the system RAM, it could be the cheapest/easiest way to get the larger, 70b models running since you can cheaply slap 64 gigs of RAM in a device and give 50 gigs of it to the iGPU.
Reply
Alvar "Miles" Udell

I downloaded it and am using the phi 2 3B LLM. On my 16 core 5950X it is using between 30-38% of the CPU, about 2.5GB RAM, and generates decently fast, about as fast as Copilot renders its answers, but when using GPU offload to my RTX 2070 it uses 90% and renders at a much faster rate, maybe 3-4x as fast.

Running slowly on anything less than an 8-core CPU, and likely even an 8-core CPU, would be quite slow indeed.
Reply
Albert.Thomas

"Chat with RTX is arguably the most fleshed-out of the solutions, as it can analyze documents, videos, and other files."

I've been toying with the idea of using a chatbot to assist in some of the work I do outside of cooling reviews, and Chat with RTX is the only one (that I know of) that is evenly remotely useful because of the ability to read documents.

But its lack of chat memory makes it limited in usefulness
Reply
Alvar "Miles" Udell

Albert.Thomas said:
"Chat with RTX is arguably the most fleshed-out of the solutions, as it can analyze documents, videos, and other files."

I've been toying with the idea of using a chatbot to assist in some of the work I do outside of cooling reviews, and Chat with RTX is the only one (that I know of) that is evenly remotely useful because of the ability to read documents.

But its lack of chat memory makes it limited in usefulness

Sadly another feature us poor Turing users won't get.
Reply
ezst036

Alvar Miles Udell said:
Sadly another feature us poor Turing users won't get.

That's the downside of choosing Nvidia. No longevity.

Arguably Nvidia has more programmers and pretty much more of everything, and faster hardware, and offers a full/complete package right out of the box for people. For that reason Nvidia is on its way to being the most valuable company on the planet.

The downside is you're completely dependent upon Nvidia. And once they kick you to the curb, your kicked. You have no recourse.

AMD isn't nearly as far along with so many things, and Nvidia users can (and do) list the ways and numbers of how those facts bear out. However, AMD's solutions work across a much further generational line of cards and heck, work on non-AMD systems or partial AMD-systems. So they are generally inclusive that way.

If you're someone who only keeps computers for a few short years - 2 or 3 years or some small number, no doubt you want an Nvidia solution and it surely isn't even a question.

But for anybody else, AMD is probably the better bet. Even if at the outset their new solution only covers 3 generations because its open source you can bet someone will easily adapt it to that fourth generation back. Or fifth. Or even sixth if reasonable performance exists. Generally speaking AMD does not put a stop to this and just the opposite, encourages it.

Much of what I said also applies to Intel as well, there's great longevity. Most of what they create now is open source as well. However, AMD does seem to lead the pack here.

But. For those 2-3 "brand new" years? Nvidia is simply the best of the best.
Reply

Show more comments