The Scale and Impact of System Outages
One of the most disturbing things about Friday’s devastating global outage of IT systems is how routine such ruinous events have become. Similar glitches from companies like Amazon.com Inc. have temporarily shut down systems across the globe. This latest issue comes as a result of a botched software update from the cybersecurity firm CrowdStrike Holdings Inc., whose link to mega customer Microsoft Corp. has led to worldwide problems — including chaos in airports, stock exchanges, and hospitals. Though a fix has now been deployed, the scale of this outage is unprecedented.
The Domino Effect of a Single Glitch
CrowdStrike’s Falcon software update, ironically designed to prevent harm from viruses and cyber threats, caused significant disruption due to a coding error. Falcon, which has privileged access to the core of operating systems like Windows, malfunctioned and rendered many systems useless. This forced airlines to revert to manual operations, such as writing flight times on whiteboards and issuing handwritten paper tickets. Even a British TV news station was forced off the air.
Why Such Access is a Double-Edged Sword
In theory, having cybersecurity tools like CrowdStrike’s Falcon with privileged access to the operating system kernel is beneficial. It ensures that even if a hacker gains root access, they can’t simply deactivate the anti-virus software. However, this incident highlights the flip side of such access. If the security tool itself fails, the repercussions are widespread and severe. Apple Inc. and Linux operating systems were not impacted by the glitch, indicating they do not grant such privileged access to Falcon, which now seems wise.
The Underlying Issue: IT Complexity and Market Concentration
This wasn’t a cyberattack, but rather a result of the Byzantine complexity of cloud IT processes. The cybersecurity industry has done a stellar job in marketing itself against threats, but one downside is the neglect of basic IT hygiene as infrastructure becomes more intricate. As noted by Palo Alto Networks Inc. Chief Executive Officer Nikesh Arora, most customers spend more on cybersecurity than on IT.
A significant problem is the supply chain for cloud computing and cybersecurity services. With just three companies — Microsoft, Amazon, and Alphabet Inc.’s Google — dominating the market for cloud computing, a minor incident can have global ramifications. This concentration creates a single point of failure that can disrupt numerous companies and organizations.
Practical Solutions and Recommendations
One technical solution is to adopt a “double boot for OS and kernel-modules upgrades” approach. This means restarting a system twice when updating software: the first boot applies the update, and the second ensures system stability before fully activating changes. While Microsoft has not confirmed if it employs such processes, implementing them could prevent similar issues in the future.
Policy and Regulatory Changes
To mitigate such risks, regulatory changes are necessary. European lawmakers have taken steps with the new Data Act, aiming to lower the cost of switching between cloud providers and improve interoperability. US lawmakers should consider similar measures. Forcing companies in critical sectors to use more than one cloud provider for their core infrastructure would reduce reliance on a single provider. A new regulation could mandate that no single provider accounts for more than about two-thirds of their critical IT infrastructure.
Using This Incident as a Catalyst for Change
The pain caused by Friday’s outage should be a catalyst for change. We must move beyond piecemeal solutions and address the systemic issues within IT infrastructure and cybersecurity. By learning from these incidents and implementing comprehensive solutions, we can prevent future outages and create a more resilient IT environment.
For those needing managed IT services to safeguard against such disruptions, our Done-For-You (DFY) IT Department offers comprehensive solutions tailored to your business needs. Visit to learn more about how we can help you protect your business from similar system outages.