This Week in DevOpsCube

  1. MLOps: Feature store explained for DevOps Engineers

  2. Setup a ML feature store on Kubernetes

  3. How to reduce Kubeflow Docker Image from 3.17 GB to 354 MB?

  4. Kubernetes CNI troubleshooting scenario

  5. How Uber Runs 60,000 AI Agent Tasks and more..

  6. Free claude courses.

☕ Grab a coffee and catch up on this week’s DevOps, MLOps and AI insights and resources.

🤖 MLOps: Feature Store Explained

If you want to level up in MLOps, Feature Store is a must-know concept. It is one of the core building blocks of MLOps.

In our latest MLOPs newsletter edition, we covered the following.

- What is a feature store actually
- Offline vs Online features (very important)
- What is feature registry.
- Feast feature store architecture
- Role of DevOps engineers in feature management.
- Hands-on Feast Feature Store setup on Kubernetes

☸️ Set Up Feast on Kubernetes

The best way to understand a ML feature store is by setting it up and manage features yourself. We have a detailed hands-on guide that covers the open-source Feature Store called Feast.

In this guide, you will learn:

- What is Feast?
- Key Feast components
- Feast Operator setup on Kubernetes
- Configure offline and online stores.
- Use a simple Python script to verify feature serving
- How to measure feature serving latency metrics like p50, p95, and p99

👉 Read It Here: Feature Store Setup on Kubernetes

💸 [65% OFF] Linux Foundation Limited Discount

You won't see this 65% Discount again until Cyber Monday👇

Get flat 50% off on individual, certifications like CKA, CKAD, CKS using the following coupon.

Coupon: Use code MM26CCCT at kube.promo/devops

Use code MM26BUNCT to save up to 60% on the following Kubernetes certification bundles.

⏳ Once this offer expires, it’s gone. Grab it while you can.

🚨 Calico CNI Troubleshooting on AWS

When deploying a Kubeadm based kubernetes cluster on AWS with Calico CNI, you may encounter a connection timed out issue between Pods and CoreDNS.

We encountered this issue and we have created a detailed blog that explains,

  • Why the issue happens

  • How to troubleshoot it step by step

  • The actual root cause

  • How AWS networking interacts with Calico

  • How to fix it properly

🎓 Complete Kubernetes & CKA Course

10,000+ engineers have learned through DevOpsCube courses.

From container fundamentals to CKA preparation, every course is self-paced and built around real scenarios.

This is not a long video lecture series.

The CKA course is text-based, illustration-rich, and designed for faster learning and quick revision whenever you need it.

👉 Use code FLASH40 to get 40% OFF today.

Note: This offer expires soon

📦 Docker Image Optimization: 3.17 GB to 354 MB (Our Learnings)

Image optimization is not just an infrastructure problem. It is a collaborative effort between data scientists, developers, and DevOps engineers.

Because each role owns a different layer of the bloat.

For example, a DevOps engineer cannot safely remove a library without confirming with the data scientist whether the model actually needs it. Real optimization happens when all teams sit together and ask: What does this image actually need at runtime?

In our case, an image used in a Kubeflow pipeline project was 3.17 GB.

In this blog, you will learn how the image was optimized to 354 MB (an 89% reduction) and the reasoning behind every change.

🛠️ Create Reusable Helm Templates Using _helpers.tpl

If you are working with Helm charts, you might have seen a helpers file inside the /templates folder. Most people ignore it without understanding what it is used for while deploying a chart.

In our guide, we have covered,

  • What the helpers tpl is and how it works.

  • What are named template/partial

  • Hands-on example demonstrating its usage.

  • When not to use it

🛠️ How Uber Runs 60,000 AI Agent Tasks Per Week With MCP

This video explores how Uber uses the Model Context Protocol (MCP) to scale AI agent workflows. For a DevOps engineer, the key takeaways involve:

  • Infrastructure for AI: Learning how to build and maintain the "plumbing" required for thousands of autonomous agent tasks.

  • Standardization: Understanding MCP as an emerging standard for connecting AI models to data and tools.

  • Operational Excellence: Seeing how Uber handles the reliability and monitoring of agentic systems at a massive scale.

👉 Watch the Video Here: How Uber Runs 60,000 AI Agent Tasks

📚 Free Anthropic Claude courses

If you are a DevOps engineer trying to understand how AI fits into infrastructure, automation, and developer workflows, these Claude courses are worth checking out.

Anthropic has structured learning paths around:

  • AI Fluency and prompt engineering

  • Claude API development

  • Claude Code for engineering workflows

  • MCP (Model Context Protocol)

  • AI agent workflows and integrations

👉 Start Here: Free Claude Courses

🛠️ DevOps Tool of the Week (Kafbat UI)

Running Kafka in production is great until you need to actually look inside it.

Kafbat UI is a free, open-source web UI to monitor and manage Apache Kafka clusters.

  • It gives you a single pane of glass for your Kafka clusters.

  • Brokers, topics, partitions, consumer groups, schema registry, Kafka Connect, all in one dashboard.

  • You can browse messages in JSON, Avro, or Protobuf, filter live streams with CEL expressions, check consumer lag per partition, and create or reconfigure topics without touching a CLI.

👉 Start Here: Kafbat UI

Reply

Avatar

or to participate

Keep Reading