Explained: Why Facebook, WhatsApp and Instagram suffered hours of breakdown
Downdetector, which monitors internet issues, said this Facebook outage was the largest it had seen that impacted over 10.6 million users worldwide.Author : A. Gayatri
Facebook, WhatsApp and photo-sharing app Instagram suffered massive downtime on Monday night that knocked millions of users around every corner of the world, which was restored after more than six hours. Facebook on late Monday stated that it was working to restore access to its services and is “happy to report they are coming back online now."
Zuckerberg in a Facebook post apologised to its users for the disruption and keeping the patience. The company, however, did not confirm the root cause of the outage, it is said it was a DNS failure that resulted in the downtime that began around 8:45 pm IST and is said to be one of the longest failures in recent memory.
Downdetector, which monitors internet issues, said this Facebook outage was the largest it had seen that impacted over 10.6 million users worldwide.
Referred to as the internet’s phone book DNS - it is what translates to hostnames when you type into a URL tab - like facebook.com—into IP addresses, which are those websites that work from and exist and take the user to their desired destination online.
It was reported that some of Facebook's internal applications were also hit, involving the company’s own email system. According to a Bloomberg report, the company’s employees at the Menlo Park, California, campus could not access offices and conference rooms that required a security badge.
Recognising that “some people are having trouble accessing (the) Facebook app”, Facebook said it’s working to restore the access but didn’t elaborate much on the reason behind the outage and the number of users impacted.
Adam Mosseri, Instagram head, tweeted saying it feels like a “snow day” and Mike Schroepfer, Facebook’s outgoing chief technology officer, blamed “networking issues”.
Facebook employees said the outage was caused by an internal routing error to an internet domain, a Reuters report said, adding that failures of the internal communications tools and other resources that depend on the same domain to work had added more to the issue.
While several security experts stated that Facebook, WhatsApp and Instagram service disruption could have resulted from an internal mistake and added that apparent possible theory of treason by an insider.
"Facebook basically locked its keys in its car," tweeted Jonathan Zittrain, director of Harvard's Berkman Klein Center for Internet & Society.
Alex Stamos, a former chief security officer at Facebook, said the cause behind the issue is “probably a bad configuration or code push to the network management system,”
“This isn’t supposed to happen,” added Stamos.
“Facebook's outage appears to be caused by DNS; however that's just a symptom of the problem,” Troy Mursch, chief research officer of cyber threat intelligence company Bad Packets, told agency Wired.
According to Mursch, the cause of the issue is that Facebook has removed the so-called Border Gateway Protocol route which comprises the IP addresses of its DNS nameservers. Other experts also agree with his reason.
The most suitable answer to explain the outage was a misconfiguration on Facebook’s part, as said by several internet infrastructure experts.
“It appears that Facebook has done something to their routers, the ones that connect the Facebook network to the rest of the internet,” said John Graham-Cumming, CTO of internet infrastructure company Cloudflare.
Stressing that he does not know many details relating to what happened, Graham-Cumming says, after all, the internet is basically a network of networks, each advertising its presence to the other. And for once, Facebook has stopped advertising, he adds.
This can also be explained like more than just Facebook’s external services are affected. As an example, a user cannot use “Login with Facebook” on third-party sites. And because the company’s own internal networks cannot reach the outside internet, that makes its employees helpless to correct things immediately.
Therefore, it takes longer to back up and run the services.
Meanwhile, the outage has made the internet feel Facebook’s absence and especially to DNS resolvers like Cloudflare - services that convert those domain names to IP addresses. They have witnessed a double amount of traffic than usual days as people keep coming to try loading their Facebook and other social media accounts and get no assistance.
The user traffic doesn’t overflood the system, but the rise in loading requests depicts how interdependent and sometimes unstable, the internet space is.
“It’s not so much the dramatic story of the whole internet could fall over, or some nonsense like that,” says Graham-Cumming.
"It’s more that it’s an interconnected system and it stays up partly because of technical things and partly because of people who keep an eye on it day and night,” he added.