The advice, which is specifically for virtual machines using Azure, shows that sometimes the solution to a catastrophic failure is turn it off and on again. And again.
If this somehow works, good on Microsoft, but what the fuck are they doing on boot cycles 2-14? Can they be configured to do it in maybe 5? 3? Some computers have very long boot cycles.
There’s nothing magical about the 15th reboot - Crowdstrike runs an update check during the boot process, and depending on your setup and network speeds, it can often take multiple reboots for that update to get picked up and applied. If it fails to apply the update before the boot cycle hits the point that crashes, you just have to try again.
One thing that can help, if anyone reads this and is having this problem, is to hard wire the machine to the network. Wifi is enabled later in the startup sequence which leaves little (or no) time for the update to get picked up an applied before the boot crashes. The wired network stack starts up much earlier in the cycle and will maximize the odds of the fix getting applied in time.
That makes sense with how the article said “up to 15 times” which does sort of indicate it’s not a counter or strictly controllable process. Thank you!
I was thinking (from reading the headline) that if one specific component fails 15 times during boot or so, it will just automatically get disabled by the system, so that you don’t run into an unavoidable boot loop.
But this makes sense as well, if they did write “up to” in the article (as others have stated). Even though I find the confidence weird. Imagine you have some weird dial-up or satellite internet solution for your system, which just needs time to connect, and then maybe also just provide a few bytes/kilobytes per second. This must be rare, but I’m 100% confident that there exists a system like this :D
Edit: okay, I should read first. The 15 times thing is said for azure machines.
macOS has something to this effect where if it detects too many kernel panics in a row on boot it will disable all kernel extensions on the next reboot and it pops up a message explaining this. I’ve had this happen to me when my GPU was slowly dying. It eventually did bite the dust on me, but it did let me get into the system a few times to get what I needed before it was kaput.
Microsoft’s Azure status page outlines several fixes. The first and easiest is simply to try to reboot affected machines over and over, which gives affected machines multiple chances to try to grab CrowdStrike’s non-broken update before the bad driver can cause the BSOD. Microsoft says that some of its customers have had to reboot their systems as many as 15 times to pull down the update.
If this somehow works, good on Microsoft, but what the fuck are they doing on boot cycles 2-14? Can they be configured to do it in maybe 5? 3? Some computers have very long boot cycles.
There’s nothing magical about the 15th reboot - Crowdstrike runs an update check during the boot process, and depending on your setup and network speeds, it can often take multiple reboots for that update to get picked up and applied. If it fails to apply the update before the boot cycle hits the point that crashes, you just have to try again.
One thing that can help, if anyone reads this and is having this problem, is to hard wire the machine to the network. Wifi is enabled later in the startup sequence which leaves little (or no) time for the update to get picked up an applied before the boot crashes. The wired network stack starts up much earlier in the cycle and will maximize the odds of the fix getting applied in time.
That makes sense with how the article said “up to 15 times” which does sort of indicate it’s not a counter or strictly controllable process. Thank you!
I was thinking (from reading the headline) that if one specific component fails 15 times during boot or so, it will just automatically get disabled by the system, so that you don’t run into an unavoidable boot loop.
But this makes sense as well, if they did write “up to” in the article (as others have stated).
Even though I find the confidence weird. Imagine you have some weird dial-up or satellite internet solution for your system, which just needs time to connect, and then maybe also just provide a few bytes/kilobytes per second. This must be rare, but I’m 100% confident that there exists a system like this :DEdit: okay, I should read first. The 15 times thing is said for azure machines.
macOS has something to this effect where if it detects too many kernel panics in a row on boot it will disable all kernel extensions on the next reboot and it pops up a message explaining this. I’ve had this happen to me when my GPU was slowly dying. It eventually did bite the dust on me, but it did let me get into the system a few times to get what I needed before it was kaput.
Just imagine if it’s a build farm with hundreds of machines. Jesus. That’s a hell I wouldn’t even wish on my worst enemy.
I am so confused. What’s supposed to happen on the 15th reboot?
The IT guy quits and it’s no longer their problem to fix
Probably triggers some auto-rollback mechanism I’d guess, to help escape boot loops? I’m just speculating.
Welp, Ars Technica has another theory:
https://arstechnica.com/information-technology/2024/07/crowdstrike-fixes-start-at-reboot-up-to-15-times-and-get-more-complex-from-there/
Yep. That makes more sense. Thanks!
That’s some high quality speculation