Why 88% of Machine Learning Models Are Never Taken into Production

And how we can use MLOps to make the process significantly more successful.

There is no denying that machine learning (ML) is making its way into the corporate world, with the amount of ML pilots and implementations predicted to double from 2017 to 2018, and again by 2020 (Deloitte, 2018). Nevertheless, despite spending on machine learning being around 57.6 billion USD in 2021 (IDC, 2017), according to Dotscience, a staggering 88% of machine learning initiatives never make it out of the testing phase (Dotscience, 2019). Algorithmia reports only 22% of companies have successfully deployed ML models to production (Algorithmia, 2020). What causes this, and can we do better?

With machine learning best practices being still in their infancy, there are many barriers that prevent companies from being successful in turning their experiments into products. These difficulties include the deployment and automation, diagnostics and scalability of machine learning pipelines. Many parallels can be drawn with the development of traditional software which used to suffer from the same problems in the 1990s: collaboration was hard, software delivery was slow and various disciplines were typically split across multiple silos. By learning from our knowledge about software development best practices, we can be successful with the productionizing of machine learning models.

Why was DevOps introduced?

Over the last decades, software engineers have invested a lot of time inventing best-practices around the development, deployment and management cycle of their software. Deploying a piece of code used to be as simple as dragging a handful of files to a server or deploying them right from the code editor. Testing was often done by hand. This caused many reliability issues because it involved a lot of manual and error-prone work. The field of DevOps has introduced many workflow adjustments that aim to automate most of the work, make the work more reliable and shorten the development lifecycle. Some examples of things that DevOps introduces to enhance the software development lifecycle are:

  • Automated deployments – to ensure that employments are repeatable, faster and no longer involve human actions. It also gives better insights into what was deployed, when it was deployed and by whom.
  • Automated testing – continuous testing continuously runs an automated test suite that can catch many errors introduced by developers during the development process.
  • Infrastructure as code – by scripting the infrastructure as code or by scripting the build and release pipelines, you make them part of the code. The code and the infrastructure it runs on are two heavily interdependent pieces that both have their own version history. Having these artifacts as code not only means that you can keep track of their history, but also guarantees that you always use the same environment.

The practices introduced by DevOps have made it easy to collaborate with other developers, manage the lifecycle of code and it has resulted in very easy to use deployment pipelines, resulting in short iteration cycles and a high rate of success when it comes to putting new code to use.

The reasons why Machine Learning projects fail

The field of data science, despite its fast growth, is still lacking in best-practices and tools. Notably tools for collaboration (most work is still done in a local Python Notebook), versioning (of models and datasets), deployment and monitoring (of prediction quality) are very new and constantly evolving.

Aside from conceptual reasons why machine learning projects fail, such as trying to solve the wrong problem, not having enough (or the wrong) data, there are many reasons that have more to do with the data science lifecycle. According to Dotscience, only 33% of data scientists use git for code collaboration, 44.4% of data scientists manually track model performance metrics and roughly 65% performs model optimization manually or not at all. In addition, there is often a large divide between data scientists that create models and machine learning engineers that deploy the models into production, leading to significantly longer development cycles. Typically, a data scientist creates a model using a Python notebook, after which an engineer must manually take this model, create a scoring API around it, deploy it to production and monitor it over time. Whenever something changes to the environment, all steps must be repeated manually, making the process slow and error-prone.

Machine Learning lifecycle

How MLOps can make your projects more successful

Where the software engineering field has relatively mature processes with DevOps, the data science field is very young and still looking for best-practices. Developing machine learning models is similar to developing software, but not similar enough that DevOps practices can be completely copied. Where software development revolves around code, machine learning evolves around data. However, many lessons can be learned from DevOps and modified to work for machine learning, this is called MLOps. MLOps is a term that was only coined in 2018 (Google, 2018) when Kaz Sato presented Google’s best practices around “DevOps for ML”.

So what is MLOps? Shortly, it’s a set of practices that solve the shortcomings of a naïve machine learning development process. It provides best-practices around deployment and automation, reproducibility, scalability, collaboration, monitoring and management of machine learning development cycles. There are several technical implementations of this, such as the practices described by Microsoft (which Antheon mostly uses) (Microsoft, n.d.), but the practices are technology-agnostic.

What would a pipeline look like when it uses MLOps? To start with, all data being generated would automatically be included into the machine learning process in some way. Depending on the needs, new incoming data should be automatically ingested, processed, enriched and used for model training. The data would also be versioned in datasets and linked to the models that are trained using this data. Secondly, the code would no longer live in a Python notebook (which should only be used for experiments) but would live in separate modules that are responsible for data cleaning, training of models and making predictions. All code would be version controlled and automatically deployed using continuous integration and continuous deployment. The model would be versioned when deploying to production, keeping track of its performance over time. After deploying the model, it would be monitored by analysing signals such as data drift and the scoring API would be monitored for latency and uptime.

MLOps is still a new field that keeps developing at a fast pace. However, with the current set of MLOps practices, data science projects can faster and more reliable hit production, achieve better collaboration between team members and yield better results for the business.


  • Algorithmia. (2020). 2020 state of enterprise machine learning.
  • Deloitte. (2018). Deloitte Technology, Media and Telecommunications Predictions 2018.
  • Dotscience. (2019). 2019 State of Development and Operations of AI.
  • Google. (2018, 7 26). What is ML Ops? Best Practices for DevOps for ML (Cloud Next ’18). From Youtube: https://www.youtube.com/watch?v=_jnhXzY1HCw
  • IDC. (2017). Worldwide Artificial Intelligence Spending Guide.
  • Microsoft. (n.d.). MLOps on Azure. From https://github.com/microsoft/MLOps

Let’s connect

Are you ready to work more easily and cost effective? Please contact us for more information or a no-obligation consultation. We will get back to you as soon as possible!

  • This field is for validation purposes and should be left unchanged.