Graceful shutdown of Node.js Express applications for zero-downtime deployments

Published in

Blai Pratdesaba

4 min readJul 3, 2021

In this article, we’ll explore how to prepare an Express server to gracefully shutdown.

When you are running an application on a high availability/autoscaling environment, you will have a load balancer to control input connections and distribute them to multiple instances of your project.

The load balancer will be checking that the application is working by monitoring the connections by using a health check every few seconds, (depends on configuration).

In the case that one of the server instances stops serving connections, like when doing a rolling update, the load balancer will still send it traffic until it does a health check. All the connections done during that time that is sent to this instance will receive a `502 Bad Gateway`

The idea is that we need to tell the load balancer that the server is going down and to stop sending new connections there. To achieve that we will be using the health check. We will be stopping the health check ahead of time to allow the Load balancer to stop sending new requests meanwhile we process the current in-flight requests and new ones meanwhile the server is still trying to close.

Once enough seconds have passed, we will be stopping the server and continue to gracefully shut down the process.

In a nutshell, we will be doing it in three phases:

Notify the load balancer by stopping the health endpoint ahead of the shutdown.
Telling express to stop accepting connections.
Telling node that we are ready to shut down.

How to implement the Health Check

An example of a simple health check is

Note: Ideally you want to have the health endpoint served using another port to avoid exposing it to the internet. For simplicity, the example won’t be covering that.

Have the load balancer in front of this app check this endpoint every few seconds and it can understand if the app is performing or if there’s something wrong, like the process getting stuck.

When the backend `express` application receives the signal to shut down, by default it will stop accepting requests immediately. At this point the load balancer is still not aware of the backend not accepting requests until the next health check is performed, sending traffic to the specific application. This will cause a `502 Bad Gateway` error message to be sent to the client when the backend does not accept the request, causing some momentary downtime.

To avoid this, a good solution is to listen to the shutdown event, stopping the health endpoint first, wait some time and then proceed to gracefully shut down the rest of the application.

The process will be

Stop the health endpoint ahead of the shutdown.
Tell `express` to stop accepting connections.
Tell `node` that we are ready to shut down.

Implementation

Let’s start with a default express application

We’ll be adding a new health check endpoint:

On this part, we are adding a local variable called `HEALTH_CHECK_ENABLED`.

When the application starts, it is set to `true`. Once we will be receiving the event, we’ll be then switching it to `false`, causing the endpoint to send a status of `HTTP 503 Service Unavailable`

This way the Load Balancer will understand that the backend is not responding and will stop sending traffic to it.

Then, we’ll implement the graceful shut-down logic.

Over here, we’ll be defining a function that will handle the different events we can listen to, `SIGINT` and `SIGTERM`.

`SIGINT` is when a user interrupts the process in the terminal context.
`SIGTERM` is when an orchestrator like docker or Kubernetes tells the process that has to stop.

Read more about these events on the Wikipedia article Signal (IPC)

By defining a variable like `const gracefulShutdownTime = 15000;`, we’ll be giving the load balancer 15 seconds of graceful time to remove the load balancer, meanwhile the web server will still be active and serving connections.

This is how the whole example looks like:

Notes

What’s a good graceful timeout period?

Kubernetes by default wait 30 seconds after sending a `SIGTERM`. If the pod hasn’t cleaned up by then, it will send a `SIGKILL`, causing the node to terminate immediately. Although it can be changed, sticking to the 30 seconds to wrap up seems like a good practice.

What is the recommended health check time interval

I use 5 seconds intervals and tell the load balancer to consider the application down if 2 health checks fail. The reason for using two health check failures instead of one is due to having some max up CPU on moments of high load. We found that it was ideal for us to have things go slower for a few seconds rather than kill and restart the services.