Yesterday's global internet outage caused by single file on Cloudflare servers — unexpected file size caused catastrophic error, knocking out several major websites

(Image credit: Getty / Bloomberg)

Cloudflare, one of the biggest DDoS and security providers on the internet, suffered a major outage yesterday, knocking out several major websites, including X, OpenAI, and even some McDonald’s branches across the globe. Its chief technology officer has since apologized for the massive error, while its co-founder, Matthew Prince, has since released the details of the cause of the outage on the company blog.

Since Cloudflare is a web security outfit that protects a big chunk of the internet from DDoS and other similar network intrusions, one of the first thoughts of the company was that it was under attack. In fact, Microsoft released a report of a record-breaking DDoS attack against its servers on the same day that the Cloudflare issue happened. However, the company realized that it was actually caused by a configuration error after further investigation.

“The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind. Instead, it was triggered by a change to one of our database systems’ permissions, which caused the database to output multiple entries into a 'feature file' used by our Bot Management system,” Prince wrote in the blog. “That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.”

Although the file that caused the error was deployed at 11:05 UTC, its impact was first felt 23 minutes later at 11:28. The outage was initially intermittent, especially as the misconfigured file propagated throughout Cloudflare’s infrastructure. After 13:00 UTC, the error had completely taken over the network, and it wasn’t until 14:30 UTC that it was identified and resolved. By 17:06 UTC, all affected services were restarted, and traffic has returned to normal.

While there are alternatives to Cloudflare, it’s one of the biggest CDN providers on the market, owning around 28% of the market share, according to Blazing CDN. This meant that an issue with the company could potentially take down a third of the internet. However, this isn’t the first massive outage of 2025 — Amazon Web Services went down in late October, crippling several online services, while a buggy CrowdStrike update caused Windows machines all over the globe to BSOD in July.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.

10 Comments Comment from the forums

blitzkrieg316

One file can take down the global internet... and not the first time...

We need to seriously rethink our entire internet dependence from systems acting like SKYNET...
Reply
Imaletyoufinish

So that's what happened. Had a bunch of Cloudflare server popups when I was looking up CUs in some of AMD's APUs (Strix Halo for example) and Tech Power Up, Notebook Check and other sites with that information (all in a row using Duckduckgo search) were not loading up and had the same Cloudflare server error. After a while I switched VPN locations and things seemed back to normal so didn't know how big of an issue this was until reading this.
Reply
valthuer

blitzkrieg316 said:
One file can take down the global internet... and not the first time...

We need to seriously rethink our entire internet dependence from systems acting like SKYNET...

There's no going back, i'm afraid.

If it's any consolation, the fact that so many people around the world (including companies, like OpenAI) rely on Cloudflare, can only mean one thing: any major issues, will be addressed as quickly as possible.
Reply
TechieTwo

It just confirms for hackers what networks to attack. :mad:
Reply
derekullo

TechieTwo said:
It just confirms for hackers what networks to attack. :mad:
That's like refusing to have doors or exterior walls on your house because that's the first spot thieves will try to attack to break in :P
https://upload.wikimedia.org/wikipedia/en/7/79/Roll_Safe_meme.jpg
Cloudflare's original goal was to prevent DDOS attacks ... Project Honey Pot
They have since become a Content Delivery Network that caches data in multiple data centers so that it can be served quickly to users.

Without Cloudflare the internet would be much slower due to much more successful and prevalent DDOS attacks and having to wait for data to come from a single server versus being cached in multiple places in multiple countries.
Reply
ezst036

I read somewhere (I think it was a substack) that this configuration file pairs up to some brand new code that was deployed using the Rust language and there wasn't not nearly as much testing around it as there should have been.

That's interesting to me considering the sterling reputation that Rust seems to (otherwise) have.
Reply
Sam Hobbs

Could the failure have been mitigated with better error checking? In other words, when the unexpected size caused a problem, the program should have exited gracefully with a useful error message. Was that done? If not then their entire system needs to be evaluated and better error checking added where appropriate.
Reply
Sam Hobbs

ezst036 said:
there wasn't not nearly as much testing around it as there should have been.
I was going to say something about testing but it is not always possible to know what to test for. In this case if they knew of the possibility of a larger file size then it would not have been unexpected and they would have coded for the possibility. If the program did not exit gracefully upon the error then that is what it should have done as I said previously.
Reply
StevenW1969

Decentralize is the only answer to a centralized system. Put all your eggs in one basket, they may all get cracked.
Reply
snemarch

ezst036 said:
I read somewhere (I think it was a substack) that this configuration file pairs up to some brand new code that was deployed using the Rust language and there wasn't not nearly as much testing around it as there should have been.

That's interesting to me considering the sterling reputation that Rust seems to (otherwise) have.
The core issues were:
1) for performance reasons, infrastructure code preallocates memory for a fixed maximum number of "features" (which had a healthy amount of buffer compared to current/expected feature size - 200 max entries vs. current ~60 entries).
2) A bug in data generation caused configuration files to explode size-wise.

The code assumed number features wouldn't exceed the max amount, but "panicked safely" instead of corrupting memory which would be worse.

I'm not sure if there's good alternatives to the main logic - you *probably* don't want to just disable the infrastructure security features if the file is too big. Just loading "up to the maximum" doesn't seem like a good idea. And not putting bounds on feature size could lead to even worse outages because other parts of the infrastructure services could get OOM-killed.

There's definitely something to learn wrt. controlled roll-outs and ability to do fast fallback to last-known-good-version :)
Reply

Show more comments