How To Create Your Own AI Chatbot Server With Raspberry Pi 4

Llama on Pi
(Image credit: pexels.com)

We’ve shown previously that you can run ChatGPT on a Raspberry Pi, but the catch is that the Pi is just providing the client side and then sending all your prompts to someone else’s powerful server in the cloud. However, it’s possible to create a similar AI chatbot experience that runs locally on an 8GB Raspberry Pi and uses the same kind of LLaMA language models that power AI on Facebook and other services. 

The heart of this project is Georgi Gerganov’s llama.cpp. Written in an evening, this C/C++ model is fast enough for general use, and is easy to install. It runs on Mac and Linux machines and, in this how to, I’ll tweak Gerganov’s installation process so that the models can be run on a Raspberry Pi 4. If you want a faster chatbot and have a computer with an RTX 3000 series or faster GPU, check out our article on how to run a ChatGPT-like bot on your PC.

Managing Expectations

Before you head into this project, I need to manage your expectations. LLaMA on the Raspberry Pi 4 is slow. Loading a chat prompt can take minutes, and responses to questions can take just as long. If speed is what you crave, use a Linux desktop / laptop. This is more of a fun project, than a mission critical use case.

For This Project You Will Need

  • Raspberry Pi 4 8GB
  • PC with 16GB of RAM running Linux
  • 16GB or larger USB drive formatted as NTFS

Setting Up LLaMA 7B Models Using A Linux PC

The first section of the process is to set up llama.cpp on a Linux PC, download the LLaMA 7B models, convert them and then copy them to a USB drive. We need the Linux PC’s extra power to convert the model as the 8GB of RAM in a Raspberry Pi is not enough.

1. On your Linux PC open a terminal and ensure that git is installed.

sudo apt update && sudo apt install git

2. Use git to clone the repository.

git clone https://github.com/ggerganov/llama.cpp

3. Install a series of Python modules. These modules will work with the model to create a chat bot.

python3 -m pip install torch numpy sentencepiece

4. Ensure that you have g++ and build essential installed. These are needed to build C applications.

sudo apt install g++ build-essential

5. In the terminal change directory to llama.cpp.

cd llama.cpp

6. Build the project files. Press Enter to run.

make

7. Download the Llama 7B torrent using this link. I used qBittorrent to download the model.

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

8. Refine the download so that just 7B and tokenizer files are downloaded. The other folders contain larger models which weigh in at hundreds of gigabytes in size.

(Image credit: Tom's Hardware)

9. Copy 7B and the tokenizer files to /llama.cpp/models/.

10. Open a terminal and go to the llama.cpp folder. This should be in your home directory.

cd llama.cpp

11. Convert the 7B model to ggml FP16 format. Depending on your PC, this can take a while. This step alone is why we need 16GB of RAM. It loads the entire 13GB models/7B/consolidated.00.pth file into RAM as a pytorch model. Trying this step on an 8GB Raspberry Pi 4 will cause an illegal instruction error.

python3 convert-pth-to-ggml.py models/7B/ 1

12. Quantize the model to 4-bits. This will reduce the size of the model.

python3 quantize.py 7B

13. Copy the contents of /models/ to the USB drive.

Running LLaMA on Raspberry Pi 4

(Image credit: Tom's Hardware)

In this final section I repeat the llama.cpp setup on the Raspberry Pi 4, then copy the models across using a USB drive. Then I load an interactive chat session and ask “Bob” a series of questions. Just don’t ask it to write any Python code. Step 9 in this process can be run on the Raspberry Pi 4 or on the Linux PC.

1. Boot your Raspberry Pi 4 to the desktop.

2. Open a terminal and ensure that git is installed.

sudo apt update && sudo apt install git

3. Use git to clone the repository.

git clone https://github.com/ggerganov/llama.cpp

4. Install a series of Python modules. These modules will work with the model to create a chat bot.

python3 -m pip install torch numpy sentencepiece

5. Ensure that you have g++ and build essential installed. These are needed to build C applications.

sudo apt install g++ build-essential

6. In the terminal, change directory to llama.cpp.

cd llama.cpp

7. Build the project files. Press Enter to run.

make

8. Insert the USB drive and copy the files to /models/ This will overwrite any files in the models directory.

9. Start an interactive chat session with “Bob”. Here is where a little patience is required. Even though the 7B model is lighter than other models, it is still a rather weighty model for the Raspberry Pi to digest. Loading the model can take a few minutes.

./chat.sh

10. Ask Bob a question and press Enter. I asked it to tell me about Jean-Luc Picard from Star Trek: The Next Generation. To exit press CTRL + C.

(Image credit: Tom's Hardware)
Les Pounder

Les Pounder is an associate editor at Tom's Hardware. He is a creative technologist and for seven years has created projects to educate and inspire minds both young and old. He has worked with the Raspberry Pi Foundation to write and deliver their teacher training program "Picademy".

  • bit_user
    For This Project You Will NeedRaspberry Pi 4 8GB
    PC with 16GB of RAM running Linux
    16GB or larger USB drive formatted as NTFS
    I'm sure the Pi will also need to be running the 64-bit version of the OS. I don't know if people with the 32-bit OS would've gotten automatically upgraded, but this is probably still worth pointing out.

    As for the USB drive using NTFS, this surprised me. IMO, the only reason to use NTFS is if you need > 4 GB files and require accessibility from a Windows PC. Otherwise, I'd use Linux-native filesystems XFS or BTRFS for > 4 GB file support.
    Reply
  • Alex Fraundorf
    I was able to get this working, but I had to use some different directions. I am brand new to AI and a noob with python, so this could have easily been caused by something I did wrong.

    Step 7: I was not able to download the torrent. I ended up following these instructions from https://github.com/juncongmoo/pyllama:
    pip install pyllama -U
    pip install transformers
    python3 -m llama.download --model_size 7B
    Note: I use Linux Mint and had to turn off my firewall for the previous step. Otherwise it hung at 12.7GB indefinitely.

    Step 12: There was not quantize.py file in my llama.cpp directory. I read through the README file and found this which worked:
    ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

    I am running this on my Linux Mint machine, not a Raspberry Pi, so I was able to skip all of those steps.

    To open the chat, I used the command ./examples/chat.sh

    I hope these notes help anyone else who get stuck.
    Thank you for this tutorial. I am looking forward to playing with the chat.
    Reply