On a warm and sultry summers morning, everything seemed fine… and then?

Well, that’ll ruin your day… So, I keep hearing from others, seeing on the news, MAJOR OUTAGE FROM MICROSOFT! Major systems down, Hospitals, Banks, Airlines and more… It’s the digital apocalypse, and if you listened to some of the reports, this sounds just shy of Skynet!

So, here’s the short version of what happened and what’s going on… Crowdstrike is a company that produces security software. You know, something like anti-virus, anti-malware, but it goes much farther than that. Their products not only watch for malware, but also malicious activity, attempts to tamper with things, potential hacks, intrusions… You get it. The software is very popular with big companies. As a matter of fact, I believe I heard today that more than half of the Fortune 500 companies use their software on their systems.

Okay, now we have the ‘Who’ but what happened? Well, you know those pesky updates that Microsoft and other vendors bother you about all the time? Anti-virus, anti-malware, and yes even Crowdstrike automates deployments of those updates. What happened (as far as I’m aware based on what I’ve read or heard today) is that there was an unexpected result from the latest update that was being rolled out to Cloudstrike’s customers. Now, this could be faulty code, a bug, heck even Santa Clause for all we know right now, but the reality is once this update was applied to a machine and the device rebooted, when Windows starts, it crashes the Kernel.

No! Not that kind of Kernel! The Kernel I’m referring too is the heart of the Windows Operating System. You see, software like this needs to run in the most secured section of the computer, the most sensitive part of the Operating System, in order to do it’s job properly. This is why it’s so important that anything that is installed on your machine that runs in the Kernel space (also called ‘Ring 0’ or zero) needs to be thoroughly tested to make sure it doesn’t cause a problem there, because if it does?

Since this software is installed mainly on Fortune 500 computers, the bulk of home users were generally spared the direct effects of this, but for those traveling, trying to do some banking, having elective surgeries at some hospitals, and various other businesses today, you’ve been indirectly affected, or at the very least delayed to some extent. That’s not to say this event is over, it’s not over by a long shot! I predict the effects of this crash are going to be felt for weeks if not months till this mess can be completely cleaned up!
The good news here? There is a relatively easy way to fix a machine. It may be bricked at the moment (i.e. useless because you can’t boot Windows), but it hasn’t caused any physical damage.

I say simple, if you know what your doing, yes it’s simple…
1) Reboot Windows into safe mode
2) Login with a local Administrator level account and password
3) Delete a specific file on the file system
4) Reboot the computer normally
Now I’m leaving out lots of details there, mainly things like, do you know if there’s a local admin account on your machine? Do you know it’s password? Is the drive Bitlocker Encrypted? Do you have the unlock key/code? And on and on…
The biggest problem is that you need to be physically in front of the machine to accomplish this! If you’re part of a company that has say 10,000 machines spread out in either multiple different offices across the country, or worse, have a large portion of that number of machines in peoples homes because most of your workforce is “remote” (thanks to the pandemic), then your pretty well screwed right now. You’ll either have to “visit” every one of these devices and repair them in person, have them ship it back to you to get it fixed and then send it back, or my personal favorite, attempt to walk a person through performing the equivalent of open heart surgery over the phone…
Remember, these are generally individuals who can’t manage to walk straight down an isle in a grocery store…

Or need a list of ingredients or nutritional facts on a bag of ice…

The long and the short of this even is there are a LOT of broken computers right now. It’s not Microsoft’s fault, as much as many would like to blame them. This isn’t the catalyst to make you run out and start formatting your computers and installing Linux either (there’s nothing wrong with Windows or Linux, I’m not poking fun at Linux, that switch is a major learning curve all on it’s own). This’ll get fixed in time and hopefully this teaches developers at large why safe development practices, proper coding techniques, and through quality assurance processes and testing are required.
If, in the end, if this turns out to be simply because they pushed a faulty patch or update out to their customers, it’s going to be a serious lesson to others to test your code before you release it. You may push the button that might simply destroy your entire company!