Former Tesla AI Director reproduces GPT-2 in 24 hours for only $672 — GPT-4 costs $100 million to train

(Image credit: Shutterstock)

OpenAI launched GPT-2 in 2019, reportedly costing $256 per hour to train. However, it’s been five years since then, and we’re already at GPT-4o. Advancements in hardware, software, and data mean that training the same model will take less time and less money, as Andrej Karpathy, the developer behind the project to reproduce GPT-2 in llm.c, has proven.

The primary driver of cost savings is using a single 8XH100 node to do the training, which dropped the cost to just $28 an hour — almost 90% off in as little as five years. Nvidia launched the H100 in 2023, so OpenAI likely used hardware with much less power when it started working on GPT-2. However, the number of hours it took to train for GPT-2 is unknown. In comparison, the cost of training GPT-4 was more than $100 million.

Another thing that made llm.c much faster to train is that it directly implemented GPT training. Karpathy said, “Because llm.c is a direct implementation of GPT training in C/CUDA, the requirements are minimal — there is no need for conda environments, Python interpreters, pip installs, etc. You spin up a cloud GPU node, optionally install NVIDIA cuDNN, NCCL/MPI, download the .bin data shards, compile and run, and you’re stepping in minutes.” He added, “You then wait 24 hours and enjoy samples about English-speaking Unicorns in the Andes.”

The llm.c project started its life as part of an educational video, but it soon turned into something that Karpathy built from scratch after he got ‘stuck with some PyTorch things.’ It shows Andrej’s passion for AI and the lengths he was willing to go through to finish his project. Nevertheless, he didn’t accomplish this alone, as he had the support of several developers from across the globe.

AI training isn’t getting cheaper

Advancements in hardware, software, and training data don’t mean that leading-edge AI training is getting cheaper. Anthropic CEO Dario Amodei said that AI models being trained today already cost $1 billion, with more expensive models hitting $100 billion as early as 2025.

That’s because although hardware is getting more powerful, it’s also getting more expensive. For example, Nvidia H100s currently cost $40,000 apiece. Still, next-generation Blackwell AI chips are expected to go for $70,000, with a complete server rack reaching $3,000,000 and up unless we find hardware breakthroughs like the Sohu AI chip, an ASIC designed to work just for transformers.

Aside from the cost implications, the increasing power requirements of AI data centers are also starting to concern several experts. Just one H100 chip running at the average 61% annual utilization rate consumes 3.7 MWh of electricity annually. With Nvidia and all the other players selling over 3.8 million AI GPUs last year, that amounts to 14.3 TWh of electricity annually — enough to power 1.3 million average American households.

But even with all the money and power thrown on AI, Google DeepMind CEO says that current models are still just at the IQ level of a cat. So, we still need to invest billions of dollars more in future models. But if you want to try to build your own LLM using older models, you don’t have to need 12 digits in your bank account — just the smarts required to create the language, and a few hundred dollars would do.

TOPICS

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.