Enthusiasts know one bad part can render an entire system inoperable. That apparently holds true for Internet service providers like CenturyLink, too, because a single malfunctioning network card reportedly disrupted much of its infrastructure from December 27-29. This disruption resulted in service problems for the company's residential customers, business users, and parts of the 911 emergency system.
Many noticed problems with CenturyLink's services on December 27. But the real danger was confirmed on December 28 when Federal Communications Commission (FCC) chairman Ajit Pai said that he planned to investigate the outage after it "affected 911 service for numerous consumers across the country."
Catapult Systems senior lead consultant Nathan Ziehnert then revealed the outage's cause: "After a 50 hour outage at 15 data centers across the US — impacting cloud, DSL, and 911 services — CenturyLink says the outage is fixed and was caused by a single network card sending bad packets (they’ve since applied bad packet filtering)." That's right--people couldn't call 911 because of some bad packets.
CenturyLink acknowledged the disruption on Twitter but has yet to share any information about the incident on its consumer-facing site, investor relations page, or MediaRoom. We reached out to the company for more details and received the following response:
"The network event experienced by CenturyLink Thursday has been resolved (as of early Saturday morning). Services for business and residential customers affected by the event have been restored. CenturyLink knows how important connectivity is to our customers, so we view any disruption as a serious matter and sincerely apologize for any inconvenience that resulted. We now return to normal business operations, and customers who have a service issue should contact CenturyLink’s repair department.
We are still conducting formal post incident investigations and analysis of the Dec. 27 outage, which is why we haven't sent a final root cause communication to our customers. Our goal is to not only identify the source of the outage, but also any contributing factors. We are committed to operational excellence and take any service interruption seriously, which is why we worked around the clock until we restored service to our customers."
The situation would be comical if it weren't so dire. Who hasn't had to troubleshoot a problem caused by one small thing? That happens pretty much every time we change our TV setup. But this isn't like not being able to watch "Die Hard" on Christmas because grandpa can't work a DVD player; people couldn't call 911 when they needed emergency services the most because of a nationwide ISP's failure.
Thanks for the heads up. We've amended the article to stay closer to the facts of the matter.
To be clear, it's only down to incompetence that the system was vulnerable to such a point of failure - and especially that it took so long to diagnose and correct.
Regardless, I am sure this gave a few black hat hackers some ideas about new exploits to try and infrastructure to target.
So, I don't think it's as simple as that they lacked redundancy on their uplinks.
I’m just a guy with funny memes :)