Foundation models: Get to know what AI is built on

NVIDIA AI graphic
(Image credit: NVIDIA)

You’ve probably already seen some of the astounding things AI has proven itself capable of — generating text in an instant, giving the Pope some fashion tips, creating video of anything, or providing useful code to programmers and non-programmers alike. But AI didn’t learn to do all of these things out of nowhere. Each AI tool has been trained on different data, creating what is called a foundation model. 

To put it into more relatable terms, a doctor will go to med school and train on medical procedures. For an AI doctor, that background would be their foundation model. If you wanted a Karate AI, the training data for the doctor would be no use. Instead, it would train on martial arts techniques to create its foundation model. The foundation model is the result of large amounts of data and considerable machine learning done on that data.

For real-world AI, foundation models usually consist of data a little easier to feed into a computer than medical techniques or roundhouse kicks. Language is a common one, with vast amounts of text going into AI training to produce a foundation model. Instead text, programming languages can also be used to build foundation models adept at coding. Imagery and sound have also been popular for foundation models, allowing for the creation of AI tools that can create new images, recognize or generate speech, or create new music.

Creating a foundation model is no small feat though. On top of the vast amounts of data that AI needs to study, incredible amounts of computing power are needed to handle the machine learning that goes on with that data. This is part of what makes foundation models so important, as it would be inefficient to build a new model for every new application of AI. Instead, popular and powerful foundation models can be tailors to different applications. 

You’ve probably already heard of a few foundation models and might not have realized this is what they were. Stable Diffusion is the foundation model behind a ton of the trending AI-generated imagery, and it has expanded into a number of different models, including Stable Diffusion XL, Stability AI’s Stable Diffusion XL (SDXL), SDXL Turbo. 

For text, Gemma, Mistral, and Llama 2 are some of the most popular foundation models. Some models, like Kosmos 2, are multi-model and can process multiple types of data, allowing them to understand imagery and text.

Foundation models might sound like they’d be pretty far removed from users, but they’re actually accessible with the recent introduction of more AI PCs capable of running these models with dedicated hardware. NVIDIA’s RTX GPUs have just that type of hardware, with Tensor Cores specially designed for accelerating AI performance.

Through NVIDIA’s ChatRTX app, you can quite simply run the Mistral and Llama 2 foundation models on Windows PCs and laptops with NVIDIA RTX hardware inside. With retrieval-augmented generation (RAG), you can even feed some of your own data into the large language model, so it can provide relevant answers. For example, you could feed it the notes for a sci-fi novel you’re writing, and ask it for details anytime you forget them. The responses are quick since the model is run locally with no latency coming from sending it out for cloud processing, and by that same token, your data is more secure since it doesn’t have to be sent off of your machine either.

With an understanding of foundation models, you’ll be better able to find the right tool for the job when looking for AI applications to help you out. For even more information on the latest developments in AI and easy-to-understand explanations, check out NVIDIA’s AI Decoded blog series with weekly updates and tips on AI tools you can see in action.