MLOps – Machine Learning Operations

Kiran Kumar Nallagonda


The continuous process of operationalizing machine learning models to get business value requires observability, monitoring, feedback mechanism to retrain the models whenever necessary.

Gartner predicted in 2020 that, 80 percent of AI projects would remain alchemy i.e. run by wizards whose talents will not scale in the organization and that only 20 percent of analytical insights will deliver business outcomes by 2022.  Rackspace corroborates that claim in a survey completed in January of 2021 saying that 80 percent of companies are still exploring or struggling to deploy ML models.

 The general challenges are that most of the models are difficult to use, hard to understand, have least explainability and are computationally intensive. With these challenges, it is very hard to extract the business value. The goal of MLOps is to extract business value from the data by efficiently operationalizing ML models at scale.  A data scientist may find a model which functions as per business requirements, but deploying the model into production with observability, monitoring and feedback loop complete with automated pipelines, at low expense, high reliability and at scale require entirely different set of skills. This can be achieved in parallel collaboration with DevOps teams.

An ML engineer builds ML pipelines that can reproduce the results of the models discovered by the data scientist automatically, inexpensively, reliably and at scale.

MLOps Principles

Here are a few principles to keep them in check for better MLOps:

         a) Tracking or Software Configuration

         ML models are software artifacts that need to be deployed. Tracking provenance is critical for deploying any good software and typically handled through version control systems. But, building ML models depends on complex details such as data, model architectures, hyper parameters and external software.  Keeping track of these details is vital, but can be simplified greatly with the right tools, patterns, and practices. For example, this complexity could be simplified by adopting dockerization and/or kubernetization of all components and overlaying usual DevOps version controls.

         b) Automation and DevOps

         Automation is key to modern DevOps, but it’s more difficult for ML models. In a traditional software application, a continuous integration and continuous delivery (CI/CD) pipeline would pick up some versioned source code for deployment. For an ML application, the pipeline should not only automate training models, but also automate model retraining along with archival of training data and other artifacts.

         c) Monitoring/Observability

         Monitoring software requires good logging and alerting, but there are special considerations to be made for ML applications. All predictions generated by ML models should be logged in such a way that enables traceability back to the model training job. ML applications should also be monitored for invalid predictions or data drift, which may require models to be retrained.

         d) Reliability

         ML models can be harder to test and computationally more expensive than traditional software. It is important to make sure your ML applications function as expected and are resilient to failures. Getting reliability right for ML requires some special considerations around security and testing.

         e) Cost Optimization

         MLOps are more deeply involved with cost intensive infrastructure resources and personnel. Continuous cost monitoring and making necessary adjustments from time to time to optimize the cost as well as to drive more business value is extremely important. For some of the models, training could be the cost intensive part of the work, when compared to entire life cycle of the model and its operations. But this cost equation could change entirely when model gets deployed and scaled to numerous instances. For example, initially Alexa’s speech to text, NLP, NLG related model training was cost intensive in terms of collecting, processing the data and training the models using expensive computational resources. After the models are deployed on the cloud and scaled to planet level, most of the cost shifted to inference layer part of the MLOps.

         These kinds of cost dynamics can be tackled by estimating and monitoring the costs, adopting right technologies, architectures and processes.

         In the above example, inference layer cost is off-loaded to the device itself partially, instead of utilizing the cloud resources in every instance.

         Even the training cost will have different equation when federated learning kind of architectures are adopted. Apart from these dynamics, standardizing on the right tools for tracking (and training) models will noticeably reduce the time and effort necessary to transfer models between the data science and data engineering teams.

Model Registry

A model registry acts as a location for data scientists to store models as they are trained, simplifying the bookkeeping process during research and development. Models retrained as part of the production deployment should also be stored in the same registry to enable comparison to the original versions. 

A good model registry should allow tracking of models by name/project and assigning a version number. When a model is registered, it should also include metadata from the training job. At the very least, the metadata should include:

  • Location of the model artifact(s) for deployment.  
  • Revision numbers for custom code used to train the model, such as the git version hash for the relevant project repository.
  • Information on how to reproduce the training environment, such as a Dockerfile, Conda environment YAML file, or PIP requirements file.
  • References to the training data, such as a file path, database table name, or query used to select the data.

Without the original training data, it will be impossible to reproduce the model itself or explore variations down the road. Try to reference a static version of the data, such as a snapshot or immutable file. In the case of very large datasets, it can be impractical to make a copy of the data. Advanced storage technologies (e.g. Amazon S3 versioning or a metadata system like Apache Atlas) are helpful for tracking large volumes of data.

Having a model registry puts structure around the handoff between data scientists and engineering teams. When a model in production produces erroneous output, registries make it easy to determine which model is causing the issue and roll back to a previous version of the model if necessary. Without a model registry, you might run the risk of deleting or losing track of the previous model, making rollback tedious or impossible. Model registries also enable auditing of model predictions.

Some data scientists may resist incorporating model registries into their workflows, citing the inconvenience of having to register models during their training jobs. Bypassing the model-registration step should be discouraged as a discipline and disallowed by policy. It is easy to justify a registry requirement on the grounds of streamlined handoff and auditing, and data scientists usually come to find that registering models can simplify their bookkeeping as they experiment.

Good model-registry tools make tracking of models virtually effortless for data scientists and engineering teams; in many cases, it can be automated in the background or handled with a single API call from model training code.

Model registries come in many shapes and sizes to fit different organizations based on their unique needs.  Common options fall into a few categories:

  • Cloud-provider registries such as Sagemaker Model Registry or Azure Model Registry.  These tools are great for organizations that are committed to a single cloud provider.
  • Open-source registries like MLflow, which enable customization across many environments and technology stacks. Some of these tools might also integrate with external registries; for instance, MLflow can integrate with Sagemaker Model Registry.
  • Registries incorporated into high-end data-science platforms such as Dataiku DSS or DataRobot. These tools work great if your data scientists want to use them and your organization is willing to pay extra for simple and streamlined ML pipelines.

Feature Stores

Feature stores can make it easier to track what data is being used for ML predictions, but also help data scientists and ML engineers reuse features for multiple models. A feature store provides a repository for data scientists to keep track of features they have extracted or developed for models. In other words, if a data scientist retrieves data for a model (or engineers a new feature based on some existing features), they can commit that to the feature store. Once a feature is in the feature store, it can be reused to train new models – not just by the data scientist who created it, but by anyone within your organization who trains models.

The intent of a feature store is not only to allow data scientists iterate quickly by reusing past work, but also to accelerate the work for productionizing models. If features are committed to a feature store, your engineering teams can more easily incorporate the associated logic into the production pipeline. When it’s time to deploy a new model that uses the same feature, there won’t be any additional work to code up new calculations.  

Feature stores work the best for organizations that have commonly used data entities that are applicable to many different models or applications. Take, for example, a retailer with many e-commerce customers – most of that company’s ML models will be used to predict customer behavior and trends.  In that case, it makes a lot of sense to build a feature store around the customer entity. Every time a data scientist creates a new feature to better represent customers, it can be committed to the feature store for any ML model making predictions about customers. 

Another good reason to use feature stores is for batch-scoring scenarios. If you are scoring multiple models on large batches of data (rather than one-off/real-time) then it makes sense to pre-compute the features. The pre-computed features can be stored for reuse rather than being recalculated for every model.

MLOps Pipeline

More efficient pipelines are constructed in combination with DevOps.  Here are outlined steps:

  1. Establish Version Control
  2. Implement CI/CD pipeline
  3. Implement proper logging, centralized log stash, retrieval and querying the logs.
  4. Monitor
  5. Iterate for continuous improvement


Developing an ML production pipeline that delivers business value is extremely challenging and can be mitigated with right deployment of resources, tools, personnel, expertise, and best practices. Remember to keep it simple, iterate it to continuously improve till it meets necessary business value.


Leave a comment

Your email address will not be published. Required fields are marked *