AI models that cost $1 billion to train are underway, $100 billion models coming — largest current models take 'only' $100 million to train: Anthropic CEO
AI training costs are growing exponentially year after year.
Anthropic CEO Dario Amodei said in the In Good Company podcast that AI models in development today can cost up to $1 billion to train. Current models like ChatGPT-4o only cost about $100 million, but he expects the cost of training these models to go up to $10 or even $100 billion in as little as three years from now.
"Right now, 100 million. There are models in training today that are more like a billion." Amodei also added, "I think if we go to ten or a hundred billion, and I think that will happen in 2025, 2026, maybe 2027, and the algorithmic improvements continue a pace, and the chip improvements continue a pace, then I think there is in my mind a good chance that by that time we'll be able to get models that are better than most humans at most things."
The Anthropic CEO mentioned these numbers when he discussed the development of AI from generative artificial intelligence (like ChatGPT) to artificial general intelligence (AGI). He said that there wouldn't be a single point where we suddenly reach AGI. Instead, it would be a gradual development where models build upon the developments of past models, much like how a human child learns.
So, if AI models grow ten times more powerful each year, we can rationally expect the hardware required to train them to be at least ten times more powerful, too. As such, hardware could be the biggest cost driver in AI training. Back in 2023, it was reported that ChatGPT would require more than 30,000 GPUs, with Sam Altman confirming that ChatGPT-4 cost $100 million to train.
Last year, over 3.8 million GPUs were delivered to data centers. With Nvidia's latest B200 AI chip costing around $30,000 to $40,000, we can surmise that Dario's billion-dollar estimate is on track for 2024. If advancements in model/quantization research grow at the current exponential rate, then we expect hardware requirements to keep pace unless more efficient technologies like the Sohu AI chip become more prevalent.
We can already see this exponential growth happening. Elon Musk wants to purchase 300,000 B200 AI chips, while OpenAI and Microsoft are reportedly planning a $100 billion AI data center. With all this demand, we could see GPU data center deliveries next year balloon to 38 million if Nvidia and other suppliers can keep up with the market.
However, aside from the supply of the actual chip hardware, these AI firms need to be concerned with power supply and related infrastructure, too. The total estimated power consumption of all data center GPUs sold just last year could power 1.3 million homes. If the data center power requirements continue to grow exponentially, then it's possible that we could run out of enough economically-priced electricity. Furthermore, while these data centers need power plants, they also need an entirely upgraded grid that can handle all the electrons the power-hungry AI chips need to run. For this reason, many tech companies, including Microsoft, are now considering modular nuclear power for their data centers.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Artificial intelligence is quickly gathering steam, and hardware innovations seem to be keeping up. So, Anthropic's $100 billion estimate seems to be on track, especially if manufacturers like Nvidia, AMD, and Intel can deliver. However, as our AI technologies perform exponentially better every new generation, one big question still remains: how will it affect the future of our society?
Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.
Intel says it will miss its AI goals with Gaudi 3 due to unbaked software — Intel's $500 million AI goal unachievable as competitors rake in billions
Meta is using more than 100,000 Nvidia H100 AI GPUs to train Llama-4 — Mark Zuckerberg says that Llama 4 is being trained on a cluster “bigger than anything that I’ve seen”
-
vijosef The deviation of so many resources towards AI will create a lot of scarcity and inflation.Reply -
Notton "Amodei also added, "I think if we go to ten or a hundred billion, and I think that will happen in 2025, 2026, maybe 2027"Reply
That's like saying "I predict Tom's will add another article about AI in the next 6hours to next year, and it'll have anywhere between 400 words and 40,000.
"I'm covering all my bases" -Nostradamus, probably -
thisisaname
So, if AI models grow ten times more powerful each year, we can rationally expect the hardware required to train them to be at least ten times more powerful, too
That does not look like progress, they are just deploying more hardware and more expensive at that.
If there was progress then 10 times more powerful would need less than 10 times the hardware. -
DougMcC 1B I believe. 10B I'm doubtful. I suspect people will rest on 1B models while they refine training techniques and optimize software. 10B is a big, big investment that will be much more challenging to recoup the cost. Being first in the race to be the best has some value, but AI models are already very close to 'good enough' for 99% of the tasks people have come up with to use them for. I'd be surprised if $1B models were not good enough for just about everything. CEOs are not going to want to plunk down 10B unless they are quite sure they are getting enough gain that it will pay off, and that will require quite a lot of research first. So my personal bet is on no $10B model until past 2030.Reply -
bit_user The author neglects to consider the business aspect of this development, which another recent article covered:Reply
https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-industry-needs-to-earn-dollar600-billion-per-year-to-pay-for-massive-hardware-spend-fears-of-an-ai-bubble-intensify-in-wake-of-sequoia-report
As stated in the comments on that article, I think the methodology is way too simplistic and throws the findings into serious question. However, even if they're over by a factor of two, the underlying idea that these developments are driven by commercial incentives is indeed sound. That means it must translate into some combination of efficiencies and value-added services people are willing to pay more for, in amounts totaling such lofty sums. This problem is only compounded, if training costs indeed balloon by such amounts as Anthropic claims.
I think one aspect we shouldn't overlook is the cost of curating the training data. To avoid overfitting, you generally need a lot more training data than you have model parameters. I wonder how much of the training costs go into that. -
why_wolf "generative artificial intelligence (like ChatGPT) to artificial general intelligence" I see they're using easily (probably deliberately) confusable terminology to describe wildly different technologies. They really want the general public to think stuff like chatGPT is what the general public thinks of when they hear AI from decades of sci-fi.Reply
I still question how they actually intended on making money from any of this. Just replacing the Google search engine with this and keeping all the ad-revenue to themselves may not be enough. You're radically increasing the cost of every "search" running it through these chatbots. -
bit_user
That's one possible scenario, but the worst part might be that there is no master plan. Some people have ideas like this, but the truth is that there are lots of big questions and unknowns, yet progress continues unabated.kristoffe said:They plan on causing a massive extension to welfare mixed with unemployment, aka ubi basically. To then mitigate jobs only the ai investors will offer (at a minimum wage at best).
That's exactly the idea. Remove independent workers, small business, pool resources and money into something nightmarish similar to Idiocracy.
I'm skeptical of UBI (Universal Basic Income), since I'm not aware of any time in recent history that productivity gains have been shared out like that. Perhaps it will become a political necessity, but I just wouldn't assume it'll happen. -
kjfatl The bubble will pop and investors will lose billions.Reply
Reminds me of the Dutch Tulip craze, or early investors in US Canals and Railroads. Commodore Vanderbilt did well, nearly everyone else lost.
When will CEO's' start asking the question, what is the ROI on this project?
In the meantime, for those who know when to bail, there is a lot of money to be made.
Remember, AI is supposed to get so good that AI software developers will no longer be needed. -
bit_user
No, tulips have no intrinsic value. You can't use them to do anything. All they are is pretty, and then they die. AI has already proven its worth, well beyond that.kjfatl said:Reminds me of the Dutch Tulip craze,
I'm sure that's always a part of the discussion. For sure, those questions will continue to get sharper.kjfatl said:When will CEO's' start asking the question, what is the ROI on this project?
There will continue to be a need for developers into the forseeable future. Their tools and job activities will change and maybe there won't be as many of them, but it's like saying that factory automation would eliminate all jobs in manufacturing. Didn't happen.kjfatl said:Remember, AI is supposed to get so good that AI software developers will no longer be needed. -
DougMcC
ROI isn't hard for them at the current scale. Plenty of people willing to pay them the 50c per M tokens for gpt 3.5.kjfatl said:The bubble will pop and investors will lose billions.
Reminds me of the Dutch Tulip craze, or early investors in US Canals and Railroads. Commodore Vanderbilt did well, nearly everyone else lost.
When will CEO's' start asking the question, what is the ROI on this project?
In the meantime, for those who know when to bail, there is a lot of money to be made.
Remember, AI is supposed to get so good that AI software developers will no longer be needed.
I'm super curious to see reports of how many people are willing to pay them the $5 per M on 4o. That will tell us a LOT about what the future looks like. My company is paying for loads of tokens on 3.5 but zero on 4o because the delta in capability isn't worth the price.
There are probably a tiny number of use cases that will pay $50 / M on gpt 5. Maybe enough to fund it.
But $500/M on GPT6? Who will need that?