The web's infrastructure has a concentration problem, exposing us all to crushing outages — from AWS and Azure to Cloudflare, the perils of having a centralized internet are being felt by all
Are hyperscaler cloud providers a victim of their own success?
Internet outages happen all the time. Just this week, the recent Cloudflare outage disrupted millions of users. The infrastructure on which our digital lives are built is precarious and often prone to errors. When those happened, they used to have a small impact. A website’s servers crashing would bring down only that website and anything that relied on it. But as the web has become more centralised in its infrastructure, with a handful of companies dominating the market, any individual issue has the potential to domino into a much more significant one.
We’ve seen two recent alarming examples of that happening in real life. On October 20, thousands of services around the world fell offline and ground to a halt after processes that are meant to keep the Domain Name System (DNS) routing and records that AWS controls went out of sync, triggering a “latent race condition”, a harmful bug that cascaded through almost all of AWS’s systems, including other routing services. That meant what was initially a single error in the U.S.-EAST-1 cluster of data centres in Northern Virginia became a problem affecting everyone, as far away as Australia and the UK.
Then, less than 10 days later, a similar issue struck Microsoft’s Azure cloud system. Xbox gamers, the Scottish parliament, and many other key bits of infrastructure fell offline, thanks to another DNS issue.
Both were quickly resolved, but the speed at which they caused chaos in the online-offline world in which we now live highlighted just how precarious our digital lives can be. And it began to get people thinking: does the web’s key infrastructure have a problem of over-concentration when it comes to power?
A concentration of power
The big three cloud infrastructure providers – AWS, Azure and Google Cloud – together hold more than two-thirds of the market. They’ve attained that level of power because of their remarkable uptime. The fact that things go wrong so rarely is a vindication of their ability and reliability. Yet it also means that more and more services are hosted on fewer and fewer servers controlled by fewer companies – so on the rare occasion that something does go wrong, it goes really wrong.
“When one of the major cloud providers experiences an issue, it doesn't just affect one company; it ripples across sectors, services, and even countries,” said Graeme Stewart, head of public sector at Check Point Software. Stewart pointed out that both the AWS and Azure outages looked more like cock-up than conspiracy – certainly the AWS issue was, while the Azure one is still being investigated. Yet “incidents like this highlight how fragile our online infrastructure really is,” he said. “We have become so dependent on a handful of global platforms that one glitch can disrupt everything from banking to travel.”
And those glitches can have meaningful impacts on us all, given that the infrastructure providers are used by companies that collectively have hundreds of millions of users, and span industries, meaning that banks are as likely to go down as video games or government voting systems.
Experts weigh in
“This was yet another big flashing warning light of the potential peril we face,” Stacy Mitchell, co-director at the Institute for Local Self-Reliance, a US advocacy group promoting local, accountable economies, said in an interview with Tom’s Hardware Premium. She points out that this was an innocuous set of errors that happened to have catastrophic consequences. But it could become a more cynical ploy by those who seek to control the media and their message if they wanted to. “Imagine, for example, how this concentrated control of critical infrastructure might be wielded by a would-be authoritarian and the tech CEOs eager to curry his favour,” said Mitchell.
How to fix it is another issue – and one that doesn’t have an easy answer. “Sadly, it’s inevitable that if you want to have truly scalable global content distribution, the infrastructure required is so large it will concentrate in those that can afford it,” said Alan Woodward, professor of cybersecurity at the University of Surrey, in an interview with Tom’s Hardware Premium. “I’m not sure it will change as few can catch up.”
Even those that are catching up – running in fourth place – are big tech giants of another stripe. Alibaba Cloud remains more than 10 percentage points behind the third-place runner in terms of the cloud industry, but even then, if it could bridge that gap, it’s replacing an American big tech firm with a Chinese one.
Woodward is sanguine about what that means. “What we see happening is simply the shaky foundations of the internet and web making its presence felt, regardless of the sophistication of the applications running on top,” he said. Those shaky foundations are also a warning to any nefarious cybercriminals who might want to pull off the biggest ransomware heist. Knowing that there’s a vulnerability – not necessarily in the way the cloud infrastructure is set up, but in terms of the consequences of it going down at any point, and the pressure that would be felt to pay up to return to some sense of normality – is a tempting target.
What's the solution?
The solution that many came up with in the immediate aftermath of the AWS outage wasn’t a breakup of control, though there have been questions raised after the UK government admitted 60% of its government services rely on the big three cloud providers in order to run, but a suggestion of putting your eggs in multiple baskets.
The problem with that is that redundancy is useful, but costly – and cloud infrastructure doesn’t necessarily come cheap. Because of the impressive uptime of the hyperscaler providers, getting purse-string holders to sign off on budgets can be tricky. But if there was ever a time to try and reduce reliance on a concentrated handful of providers, then it’s now, in the immediate aftermath of their embarrassing outages.
More literacy about the issue is needed first in order to begin any process of change, reckoned Woodward. “It’s surprising that many don’t understand exactly who has the tier one networks and the backend capability for hosting or storage,” he said. And until that happens, change will be slow.

Chris Stokel-Walker is a Tom's Hardware contributor who focuses on the tech sector and its impact on our daily lives—online and offline. He is the author of How AI Ate the World, published in 2024, as well as TikTok Boom, YouTubers, and The History of the Internet in Byte-Sized Chunks.