Digital Ocean Droplets not responding

I ran into this one recently where, for no apparent reason some, but not all of my DigitalOcean servers / Droplets would just stop responding - totally dead to the world. Some symptoms were:

  • Cannot ping the droplet
  • SSH connection timed out
  • Network is unreachable
  • Unable to access Droplet console

Upon checking the access graphs, the charts would all show very low / normal idle CPU, load, RAM and disk usage before the outage and then all charts / usage would be at 0 when the droplet stopped responding.

Amazon Kids+ Parental Controls

At the time of me writing this, there's not much on the internet regarding this problem. It also took a bit of back and forth with DigitalOcean support before I found out the reason.

How to fix Digital Ocean Droplets not responding?

If you've landed on this page, you'll first want the answer and maybe the explanation later. There are 2 ways to fix this issue. But first take a few seconds to find out if this is actually your problem. It affects Ubuntu 20.04 with a kernel version of 5.4.0-144 - 5.4.0-147+

You can find out your kernel version by running uname -r on the terminal

  1. The quickest and most pain free is to do a power cycle of your Droplet. Simply toggle the power switch at the top right of your Droplets dashboard. Note that after a power cycle I've not experienced the same problem again - Your mileage may vary.
  2. The future-proof way is to upgrade your Ubuntu 20.04 server running kernel 5.4.0-144 - 5.4.0-147+ to the latest version of Ubuntu 22.04. (at the time of writing)

This brings its own pain points and you shouldn't do this on a production server. You should first use DigitalOcean's Reserved / Floating IP's to update your DNS records and point your site at the reserved IP. Wait for this to fully propagate and then provision a new server with the latest version of Ubuntu. Once this is done, update the reserved IP to point to your new server. This should minimise the downtime to almost nothing. I've wrote a guide titled: "How to use Digital Ocean Reserved IPs when upgrading Ubuntu" If you need help doing this.

Why are Digital Ocean Droplets not responding?

So it turns out this is a specific problem to do with Ubuntu 20.04 and the kernel version of 5.4.0-144 and above.

I received this from Digital Ocean level 2 support:

"Our support team had received an increasing number of reports where customer Droplets experienced similar behaviour -- sudden crashing into a kernel panic, abnormally high CPU% usage recorded in the metrics gathered by the hypervisor while empty graphs show for the metrics displayed to you as a customer and everything back to normal after a reboot.

We've dug into this further and were able to find a common connection with all of the Droplets that were reported to us. It appears that there was a bug with some version of the 5.4.0-144 kernel. Exactly what sub-version/upload-number of this kernel was in play isn't known, but we have not received any customer reports of the issue having occurred since suggesting they upgrade to the latest kernel version."

So a few things to digest here.

  1. You don't see the high CPU% usage in your own dashboard. When the DigitalOcean support showed me their graphs, the CPU% was maxed out for 1 day before the server eventually gave up!
  2. A power off / on of the droplet makes it work again
  3. It's kernel version 5.4.0-144 on Ubuntu 20.04 - More importantly I've had it happen on 146 and 147, so it's not isolated to only version 144.
  4. Somewhat reassuring is that once it happens and you've done a power cycle, it doesn't seem to happen again. At least not in my experience. It's been 2-3 weeks since this happened and all has been running smoothly!

This had me scratching my head for a while and it took a bit of back and forth from DO support to get me to a level 2 response. However I (and hopefully you reading this) have an answer and a way forward.

More Posts