✉️ In Today’s MLOps Edition
Today we will look at a key component in the MLOps pipeline. Feature store.
Why Feature Order is important during inference
What a feature store actually is
Offline vs Online features (very important)
What is feature registry.
Feast feature store architecture
Role of DevOps engineers in feature management.
Hands on Feast Feature Store setup on Kubernetes
By the end, you will understand how to move from CSV-based features to production-grade feature serving using Feast.
Before You Continue: This is part of an ongoing MLOps series. You can check this repo to go through all the previous editions in order.
Feature stores started in traditional ML, but the same pattern of a fast, reliable lookup layer for model data is now used in RAG pipelines, agent memory, and LLM tool calls etc.
So we need to go a bit deeper into how features and feature store work. As a DevOps engineer, you don’t need to know every detail, but just enough to understand what is happening under the hood.
Let’s get started.
Feature Engineering Recap
A feature is basically a input variable to a model. In our employee attrition example, the features are Age, Gender, Years at Company etc. It is the data the model uses to make predictions.
In the data preparation edition, we had a stage called feature engineering. In that stage, we converted raw data (employee_attrition.csv) into a format (featured.csv) that a machine learning model can understand and use (ie, numbers)
If you open the featured.csv, every value should be a number as shown below.

Key Insight:
Feature engineering is considered very important for model performance and a lot of ML development effort goes in to feature engineering
Why Feature Order Is Important
The model does not understand column names or its meaning. It only sees positions as shown below.

Based on this, the model learns patterns like “if the value at index 1 is high, probability of leaving is low”. Now notice something carefully. The model does not know that index 1 is Salary. It only knows index 1 is some number. That’s it.
This is why feature order is important during inferencing.
Feature Order During Inferencing
In the training the model edition, Step 3 (model training), we wrote predict.py. In that script, we passed employee data and got a prediction (stay or leave) in the CLI.
The important thing in that script is this. The employee data we pass during prediction must match the exact same column order used during training.
The following image shows the relevant input_record part of that script. The dictionary keys are in a specific order. That order was chosen to match the column order in featured.csv

Now let’s say during prediction you pass Salary as the first column. But during training, the first column was Age. The model will still assume that the first column is Age. So now Salary is being treated as Age.
If it does not match, the model will not throw any error. It will still run and give you a prediction. But that prediction can be completely wrong.
The following image illustrates it.

So why no error?
From the model’s point of view, nothing is wrong. It is still getting the same number of columns and the values are of expected type. So it continues execution.
This is what we call a silent failure.
So, anyone who wants to use this model for prediction must pass data in the same column order. Every time. Without exception.
Now that we have understood the importance of features and the role they play during training and inference, let’s understand what a feature store really solves.
What Is a Feature Store?
As we learned, features are required during training and prediction. And they must be consistent. Same order. Same structure. Every time.
Managing this manually does not scale.
For example, one team updates a column. Another pipeline still uses the old feature version. Or worse, the feature order changes and the model gets wrong inputs. However, the model does not fail. It gives wrong predictions silently.
A Feature Store solves this problem.
It acts as a central system that stores features and serves them consistently to both training and prediction pipelines.
Note: In our project, we will use Feast as the Feature Store. It is an open source, Kubernetes-native tool used by companies like Nvidia, Shopify, Expedia etc
Offline Vs Online Feature Store
Training and inference use different features, which are retrieved from separate data stores, as shown below.
1. Offline Store (For Training)
We need large volumes of historical feature data to train the model. For example, in our case, features from lakhs of past and current employees. You can think of it as a data warehouse for features.
The offline store is used for training and is usually backed by storage systems like AWS S3, Redshift, BigQuery, etc as shown below.

2. Online Store (For Inference)
During inference we need low latency real-time data. So the online store contains only active entities (e.g., current employees) for which real-time predictions are needed.
And the inference API must return features in milliseconds. We call it online store.
Online store is implemented using in-memory data stores like Redis, or DynamoDB, Bigtable etc..The following image illustrates the workflow of how the online store is used during inference.

💡 Why Redis and not S3?
S3 is too slow for real-time inference (100–200 ms), while Redis provides low-latency access (1–5 ms). This makes Redis suitable for meeting strict inference SLAs (For ex, under 50 ms).
Materialization (How Data Gets Into Redis)
You now have two stores. The offline store holds years of historical data in S3. The online store in Redis holds the latest values for active employees.
But how does data move from S3 into Redis? That is what materialization does.
Materialization moves feature data from the offline store to the online store. It reads the latest relevant feature values and loads them into Redis. This usually runs as a scheduled job using Kubernetes CronJobs or Airflow. As a DevOps engineer, you own this.
Important Note: In employee attrition project, we only need features for active employees. This filtering is handled in the pipeline. Materialization then loads only those relevant feature values into Redis for real-time inference.
Feature Registry
In phase 1 predict.py, we manually passed the inputs to the model via CLI. That was only for learning and testing.
In production inference, an HR user would simply enter the employee ID. The required employee features would then be automatically fetched from the online feature store (as shown in the image in the previous section)
For production implementation, an ML developer needs a single source of truth to retrieve the right features in the right order, in real time.
So how does the developer know which features to use and in what order to pass them to the model?
This is where the Feature Registry helps. (This is PostgreSQL in Feast)
The Feature Registry acts as a central catalog of feature definitions, including schema, entities, and metadata.
Instead of manually constructing inputs, developers can fetch the required features from the online feature store using a defined feature group or feature view.
For example,
features = store.get_online_features(
features=feature_service,
entity_rows=[{"employee_id": 101}]
).to_dict()This fetches the exact feature set defined for the model, using the same definitions used during training. It returns features in the correct order expected by the model, so you don’t manually arrange anything.
The Feature Registry is read by the Feast SDK and Feature Server to understand feature definitions and locations as shown in the image below.

Note: The training job uses the Feast SDK to read the Feature Registry, understand where the offline feature data is stored, and then query the offline store directly, such as S3. The Feast Server pod is not involved in this offline training path.
Feature Store Architecture (Feast)
For our final MLOps project we will be using Feast feature store. The following image illustrates the architecture of the Feast feature store on Kubernetes.

In feast,
Offline Store - Local File or S3
Online Store - Redis
Feature Registry - PostgreSQL
Feast on Kubernetes: A Must-Try Hands-On Guide
Reading concepts is one thing. But when you deploy and operate it yourself, the concepts and workflows become much clearer. It also gives you a practical view of how it fits into the MLOps workflow.
So, to understand feature stores better from a DevOps engineer’s perspective, you should try running it on Kubernetes.
I have created an end-to-end guide that covers the following:
Feast Architecture on Kubernetes
Deploying PostgreSQL for the metadata registry and Redis for the low-latency online store.
Creating employee attrition features
Creating offline and online feature stores.
Perform a smoke test to retrieve features and monitoring critical metrics like p99 latency.
👉 Detailed Hands-on Guide: Setup Feast on Kubernetes
Important Note: Sync your forked mlops repo to get the latest feast changes. Check out this guide to learn how to keep your fork updated.
That's a Wrap!
You now know why the Feature Store exists, what problem it actually solves.
As a DevOps engineer, you work would be:
Deploy and manage the Feast Feature Server and Redis on Kubernetes
Ensure high availability, scaling, and reliability of the feature serving layer
Integrate Feast into CI/CD pipelines for schema validation and safe deployments
Monitor Redis memory usage, latency, and eviction behavior for production workloads
Set up logging, alerting, and observability for feature serving pipelines
What's Coming Next?
We have now covered the full data side of MLOps. The ETL pipeline produces versioned data, DVC tracks it in S3, and the Feature Store ensures training and inference always consume features in the same order with the same schema.
In the next edition, we will look at Kubeflow.
It is a key component in a production MLOps setup used to manage and orchestrate ML training pipelines at scale.

