Body Based Routing

👋 Hi! I’m Bibin Wilson. In each edition, I share practical tips, guides, and the latest trends in DevOps and MLOps to make your day-to-day DevOps tasks more efficient. If someone forwarded this email to you, you can subscribe here to never miss out!

Common Routing Methods

When we look at routing in Kubernetes using Ingress or the Gateway API, most use cases involve one of the following methods.

  1. Path-Based Routing: Routing based on the URL path of the request (e.g., /users, /products).

  2. Host/Domain-Based Routing: Routing based on the hostname or subdomain in the request. For example,

    • api.example.com routes to API service

    • admin.example.com routes to Admin panel

  3. Header-Based Routing: Routing based on the value of HTTP headers. For example,

    1. Authorization header for protected routes

    2. Accept-Language for or internationalization.

  4. Query-Based Routing: Parameters in the query string commonly used for filtering/pagination. For example,

    1.  ?version=v2,

    2. ?region=us-east),

    3.  ?page=1&limit=10)

There is also another method called Body-Based Routing

What is body based routing ?

Body-based routing is a traffic management where the API gateway (or proxy) looks inside the request body, meaning the the actual data you send, like JSON or from values and then decides where to send the request.

Here the routing is made from the payload itself.

Use Case

Body-based routing is particularly useful in LLM inference, multi-model serving, or custom workflows where one URL might serve many different kinds of requests based on what is inside the request body.

For example,

In LLM inference (AI model serving), you might send requests to /inference

  • If the body says "model": "deepseek", the request goes to the deepseek model.

  • If the body says "model": "llama3", the request goes to the llama3 model.

So the body content decides the routing, not just the URL.

Body Based Routing in Kubernetes

Body based routing is now supported by the Kubernetes Gateway API Inference Extension. (refer this newsletter edition to know more)

For example, lets say the Kubernetes Gateway API receives the following JSON payload in a request.

{
  "model": "deepseek",
  "task": "summarize",
  "priority": "high"
}

The Kubernetes Gateway extension extracts the "model": "deepseek" , or "priority": "high" from the request JSON and injects it into a header.

And then the HTTPRoute configuration inspect that header and route the request accordingly to the right model pool, the lowest-latency GPU nodes, or a specialized handler for summaries.

Practical Example

Here is an HTTPRoute object with a header declaration called chatbot that routes to the gemma3 model backend.

Here is an example curl request to /inference endpoint with a JSON payload containing "model": "chatbot"

curl -X POST https://dcubelabs.com/inference \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chatbot",
    "prompt": "What is the color of the sky",
    "max_tokens": 100,
    "temperature": 0
  }'

The Gateway API extracts the model field and converts it into an HTTP header. For example,

X-Gateway-Model-Name: chatbot

The HTTPRoute rule matches requests that contain this header. Because the value is chatbot, the request is routed to the backend InferencePool/gemma3

Reply

or to participate.