Kubernetes Graceful Pod Shutdown Using SIGTERM & Prestop Hook

👋 Hi! I’m Bibin Wilson. In each edition, I share practical tips, guides, and the latest trends in DevOps and MLOps to make your day-to-day DevOps tasks more efficient. If someone forwarded this email to you, you can subscribe here to never miss out!

In today’s deep dive edition, we will look at:

  • What are SIGTERM and SIGKILL

  • How Kubernetes handles SIGTERM and SIGKILL

  • How applications can handle SIGTERM for graceful shutdown

  • The role of PreStop hooks and why it matters

  • The link between SIGTERM and container PID 1

Also learn about Docker MCP catalog, hard lessons from running Kubernetes on AWS EKS in production and more..

🧰 Remote Job Opportunities

  1. Nvidia - DevOps and Automation Engineer

  2. Prometteur Solutions - Devops Engineer

  3. Deutnet - Senior DevOps Engineer

Kubernetes Pod Graceful Shutdown

One of the fundamental concepts in Kubernetes is Pod Graceful Shutdown. It plays a key role in application reliability.

While modern application architectures often use patterns like circuit breakers, retries, and timeouts to handle failures gracefully, graceful shutdown acts as the last line of defense.

It ensures your application can clean up safely before it is forcefully terminated.

Let’s get started.

Understanding SIGTERM & SIGKILL

💡 A signal is a way Linux Kernel to talk to (sending an interrupt) to a running process. Each signal has a unique number.

First, let’s understand what SIGTERM and SIGKILL are.

These are signals in Unix-like operating systems.

  • SIGTERM (Signal 15) is used to request the termination of a process.

  • SIGKILL (Signal 9) forces an immediate termination of a process .

Kubernetes & SIGTERM

When a pod shuts down in Kubernetes (due to scaling, update, or any reason), it sends a SIGTERM signal to the app inside.

If your app does not handle SIGTERM properly, it might stop in the middle of serving a user request or processing a file (by SIGKILL), leading to data loss or a bad user experience.

Note: SIGTERM runs at the process level

For example, let's say an app is handling file uploads or user payments.

If the pod shuts down without waiting, files may be incomplete, or payments may fail. It could lead to,

  1. ECONNRESET errors for clients with in-flight requests

  2. HTTP 5xx errors (typically 502 Bad Gateway or 503 Service Unavailable)

  3. Database connections remain open until the timeout, etc.

In‑flight requests are the requests your app is currently processing when shutdown begins

By catching SIGTERM, you can,

  • Finish the current request

  • Save the important state

  • Notify other services

What Happens During Pod Shutdown in Kubernetes

Let’s look at how Kubernetes handles SIGTERM.

The following diagram shows the graceful pod termination process in Kubernetes.

Here is how it works.

  1. A pod deletion request is sent to Kubernetes via kubectl pod delete or the API server sends a deletion event.

  2. Kubernetes executes the prestop hook if any (explained later)

  3. Then the kubelet sends a SIGTERM to the container's main process (PID1). When there are multiple containers in a pod, Kubernetes sends SIGTERM to all containers simultaneously.

  4. Once the application receives the SIGTERM, it has 30 seconds by default to handle the signal and shutdown gracefully. (The grace period is a configurable value using terminationGracePeriodSeconds)

  5. Now, the application SIGTERM handler stops accepting new requests, finishes processing in-flight requests, closes database connections, and so on.

  6. If the SIGTERM handler executes all the graceful shutdown activities, the application shuts down gracefully.

  7. If the application process does not exit within the grace period, Kubernetes sends a SIGKILL signal to terminate the pod.

Is 30 seconds terminationGracePeriodSeconds ideal?

There is no definitive answer. The 30 seconds (the Kubernetes default) works for most cases.

However, for databases or heavy processing apps, you may have to use 60-120 seconds and for batch jobs or data processing, you might need 300+ seconds to finish their work properly. So it depends on the applications.

Handling SIGTERM in an Application

You can write custom logic in your application code to handle the SIGTERM signal.

The example below shows how a Flask app can gracefully shut down when it receives a SIGTERM. It finishes any remaining work, saves in-progress orders to the database, and notifies other services before exiting.

When the pod shuts down, the logs will look like the following.

If you want to test it practically, you can get the full Flask app code here.

Using the PreStop Hook for Buffer Time

When you delete a Pod, Kubernetes sends a SIGTERM signal to your app.

The problem is, while your app is starting to shut down, things like Kube-Proxy, Ingress controllers, or external load balancers (like AWS ALB) might still be routing traffic to that Pod.

This delay happens because it takes time to,

  1. Remove the Pod from the service endpoints (update takes time to propagate)

  2. Update kube-proxy iptables rules

  3. External load balancers (like AWS ALB) also need time to remove the Pod from their target list.

So if your app shuts down too quickly, clients might still get routed to it and end up with “connection refused” errors.

At the same time, there is no mechanism to control how fast other components (like Kube-Proxy, Ingress, etc.) stop sending traffic to the Pod.

But you can slow down your apps shutdown a bit using a preStop hook. The preStop hook runs before Kubernetes sends SIGTERM.

For example, adding a simple sleep 10 in the preStop gives your app a 10-second buffer to let traffic stop flowing before it shuts down.

This gives time for Kube-Proxy to update iptables rules or an external Load balancer (Eg, AWS ALB ingress controller) to stop forwarding requests to that pod.

Here is an example of a Deployment YAML with the preStop lifecycle hook highlighted.

SIGTERM & PID

Kubernetes sends SIGTERM to the process with PID 1 inside the container. If your app is not running as  PID 1, it will not receive the signal, and the code to handle graceful termination will not execute.

For example, if the Dockerfile runs a shell script that starts your app as a child process, the script becomes PID 1, not your app.

Look at the following example.

However, if you need a shell script, use exec to replace the shell process.

This applies to CMD as well. In a Dockerfile, how you write the CMD instruction affects how SIGTERM is handled.

For example,

  • CMD ["python3", "app.py"] - This is the exec form. python3 becomes the PID 1 (first process) inside the container.

  • CMD python3 app.py - This is the shell form. Docker runs it as:
    /bin/sh -c "python3 app.py" In this case, the shell (sh) becomes PID 1, and python3 is just a child process.

Modern frameworks handle SIGTERM for you

Popular frameworks include built-in support for graceful shutdown

For example,

In Spring Boot (Java), just set server.shutdown=graceful and it stops accepting new traffic and waits for ongoing requests before exiting

Go's http.Server includes a Shutdown() method that handles graceful shutdown properly.

These usually handle the common logic, such as,

  • Stop accepting new requests.

  • Waits for active HTTP requests to finish etc

However, you may need to perform special cleanup actions: e.g., deregister from a service, send metrics, flush caches, commit state to the database etc..

🧱 DevOpsCube Bytes

Since AI is becoming part of every business, DevOps engineers are now expected to deploy and manage AI systems just like they do with websites and apps. That’s why understanding the following AI fundamentals is more important than ever.

📦 Keep Yourself Updated

  1. EKS Production-Grade Pain: shares real-world lessons from the engineering team at Probo as they scaled their applications using Amazon EKS

  2. Docker MCP Catalog: Docker launched a solution that simplifies finding and using MCP‑based AI tools

  3. agntcy.org: AGNTCY is an open-source project (now managed by the Linux Foundation) created by Cisco. It’s designed to help AI agents from different companies and tools work together easily and safely.

  4. Agentic DevOps: The blog looks at how DevOps is changing. Instead of just automating fixed tasks, it is now moving toward AI-powered systems that can learn on the fly, adapt to new situations, and take action in real time

🛠️ DevOps Tools

  • warp.dev: An AI‑powered terminal that generates commands, explains errors, executes tasks with your permission, and understands natural language commands

  • Temporal: It is a tool that helps you build reliable and long-running apps. Apps often break due to server crashes, network issues, or bugs. Temporal remembers what your app was doing, and lets it continue from where it stopped, without starting over.

What did you think of todays email?

Your feedback helps me create better guides for you!

Login or Subscribe to participate in polls.

Reply

or to participate.