What we need to know about Auto ML? — The good, the bad and the ugly

What we need to know about Auto ML? — The good, the bad and the ugly

Automating the data science workflow is a good thing to do, but there are many factors to consider. Only stable and well-specified parts of the machine learning process can be automated. Even then, there will always be technical, human and problem-solving aspects that we need to take into account.


Automated machine learning (or Auto ML) is often touted as the next step in productivity for data science teams. But it is not new—I personally used excellent Auto ML tools twenty years ago¹.

Given that the concept of automated machine learning is not new, I ask myself the following questions.

  1. Why has Auto ML not been more widely adopted?
  2. What new capabilities does it have that are the product of new technologies—that is, new things you could not do before?
  3. Why should—and shouldn’t—it be used? But first, let’s define Auto ML to establish some context.

Auto ML definition and background

Auto ML is the process of automating the tasks of applying machine learning to real-world problems (Wikipedia).


_DataRobot’s view on model building and AutoML

Almost all currently available Auto ML methods target the model-building phase of a machine learning project. That’s the phase I shall concentrate on in this article, because all efforts to extend outside this well-understood scope have been misguided or ineffective. There is a good reason why this particular stage of the process is targeted for automation: it is something that (a) is mechanical and can be done without human judgement², making it possible to automate, and (b) some data scientists can spend a disproportionate amount of time on.

There is a lot of discussion online about finding the best parameters using search and optimisation technique— known as hyperparameter optimisation or tuning. Hyperparameter tuning is often touted as necessary, but it is not always a good thing. Be wary of it.

Hyperparameter tuning can be a real time waster, it takes a lot of computer processing and it often distracts your data scientists from building a solution to the real problem—especially if they are process junkies³. It can distract them selecting a “good enough” model from any number of similarly performing, perfectly good models in the Quixotic quest for “the best one”. The difference between the so-called “best model” and the perfectly adequate model is small—if not nothing—when applied in the world. And most AutoML routines do not even consider the different in application; rather they focus on optimising a metric such as log loss.

I will not go further into hyperparameter tuning in this blog post other than to point out that the hyperparameter tuning approach is fraught with the Danger of statistical bias and multiple comparisons: the “best” solution you find is statistically guaranteed not to translate into comparable real-world performance. You are deliberately selecting the best model from a group of very similar models; it is random chance that leads you to pick this particular model, not purely better performance. You will see a gap between your predicted performance and the real-world performance you experience-this is called the implementation gap.

So what Auto ML does in this context is to find the best algorithm—and possibly feature transformations and feature sets—that maps a set of inputs to a specified label.


Most Auto ML algorithms concentrate on the middle box in the model training process.

Barriers to Auto ML adoption

If Auto ML is so transformative and a number of vendors are promoting it (more on this later in the article), then why has it not been more widely adopted? After all, automating workflows and improving data scientists’ productivity are admirable aims.

1. Data scientists do not like it

We data scientists are a curious bunch. We tend not to want to cede control of model building to an algorithm. We want the tool to help us do our job.

To data scientists, Auto ML can feel like it is making their jobs redundant. Thinking back through the years, every time a new and easier-to-use tool comes along, this fear surfaces. However, I have never seen data scientists getting replaced by a tool; I have never seen significant adoption of a tool by non-data scientists.

A better way for data scientists to think about these tools is to consider how much more productive these tools could make them.

There are other reasons data scientists might not like particular Auto ML tools. They might be difficult to use, requiring additional engineering (that is, more work rather than less), they might not fit into data scientists’ workflows and they may not be open and interoperable with the other tools they are learning.

Data scientists tend to want to do things their way, and tools need to recognise this if they want to engage their potential user’s enthusiasm.

The tools may not give data scientists enough control to develop the models the way they want to make them. They may spend more time configuring and tweaking the tool to get what they want.

2. It does not fit into an organisation’s machine learning ecosystem

An Auto ML tool needs to make an organisaton’s machine learning processes more streamlined. Unfortunately, many potential solutions involve adding components. They may require data scientists to leave their preferred tools (notebooks) or not be easily put into operation.

It might not produce the kinds of models that an organisation is looking for, or the models may not have the right latency, they could be too complex or they might be too “black box” and difficult to understand.

3. There are limited options for tools available

Another barrier for adoption can be that the commercial tools require additional expenditure and licensing, and we all know this can be difficult especially when real productivity benefits cannot be shown. And open-source tools in this space tend to be less capable, less robust and poorly supported.

What new capabilities are there in Auto ML?

In terms of transformative new capabilities in Auto ML compared with (say) two decades ago, my observation is

  1. Predictive power: We now have many more types of algorithms, and of course we have ensembled, but for non-deep-learning problems, these will not radically shift Auto ML performance.
  2. Computing power: Modern cloud platforms, Spark clusters and GPUs have drastically increased our ability to use brute force with Auto ML. But for most cases, there is no need to invest in this additional power.
  3. Feature development: This is one area where Auto ML has definitely advanced. Modern Auto ML platforms are all doing interesting things with features, from automatically determining whether a field is a URL to powerful natural language processing tools to handle text features.

If you have a lot of text fields as your potential inputs to your predictive models, you may find that the advances in Auto ML capability transform your data science workflows dramatically.

Why embrace Auto ML?

Why and when should Auto ML be used and not used?

Notwithstanding these barriers to adoption, there is a plethora of good reasons to encourage the use of Auto ML.

I wonder if you have had a similar experience to this: your organisation has good automated modelling tools available, but your data scientists prefer to do things manually—and slowly.

In my opinion, this is a red flag against the offending data scientists. Intervention may be needed. Your data scientists are developing the wrong mindset. They are focusing too much on the algorithm, and not enough on the solution as a whole.

There can be a misconception—maybe even a fear—amongst data scientists about the point of Auto ML tools, that Auto ML is trying to do their job. But these tools should be viewed as augmenting data scientists, not replacing them. To make them more productive. To automate the drudgery in their jobs. Auto ML tools will take your data scientists’ well-specified problem4 (finding a mapping from inputs → outputs) and generate a “pretty good” model. It will also generate a host of other useful stuff—like feature importances, model performance measures, some model documentation and the like. These can give clues as to how to improve the model, should it be necessary to do so.

What is the workflow to augment, rather than automate?

If some of your data scientists are process junkies5, or they are the type of data scientist who knows all the levers to pull to get the most out of a particular algorithm (that is, they are experts in a particular technique who are looking for nails they can hit with their algorithmic hammers), please encourage them to augment with Auto ML. By doing this, they will see a huge productivity gain. Auto ML will get you a pretty good model built pretty quickly. Data scientists can focus on designing the solution, not on reinventing the model training process wheel.

So all this sounds pretty good. How do I get some Auto ML goodness for myself?

Whoa, Nelly! Not so fast!

Auto ML is a great addition to your arsenal, but it has its drawbacks. Essentially Auto ML is performing a brute force search optimisation through a constellation of feature transformation and machine learning algorithms to find the “best” one. It is computationally expensive. The many permutations of algorithms each take time and computing processing power to train.

Reduce the time and processing expense of Auto ML

But you do have some control over the computational overhead and training time.

Are there some algorithms you are never going to use? Turn them off. Auto ML applications have the ability to specify which algorithms to whitelist and those to blacklist.

Do you have a lot of observations? You probably do not need them all. Use sampling methods to reduce the size of the training data and hence the training time. You can always increase the size of the training sample once you have selected the winning algorithm.

Do you need the models to be explainable or interpretable? Make sure your selections of algorithms and feature engineering transformations will support the level of interpretability you need. Consider algorithms like InterpretML or even old-fashioned regression-based techniques to give you a “pretty good” model while maintaining interpretability.

Do you need ensembles? Many Auto ML algorithms will combine the best models it has built into an ensemble model. You probably do not need such a model. They are great for modelling competitions like Kaggle. But there is a cost: they are slow, they use a ship-ton of input variables and they are difficult to interpret. I only use them for demonstration purposes, not for client-facing problems. You can safely turn them off, saving time, storage cost and compute cost.

Examine your input data. Unneeded variables (or features) can cost real money and will incur additional storage and processing. This is especially relevant if you use a modern cloud-based platform for feature storage and computation. Feature stores can both help and harm here. They make using features for model training and serving straightforward, which can encourage additional, but marginally ineffective, features to be included in your models than you otherwise might choose.

Do you need real-time predictions from your model? Auto ML is unlikely to produce a model that produces predictions quickly enough for your purposes. You will be best served by building a model with low latency using extensive database lookups of precalculated predictions, a quick-to-calculate algorithm and as few features as makes sense6.

Where does Auto ML fit in my workflow and pipeline?

So, okay, it might sound like I am not recommending the use of Auto ML as I am against it. I am not. There is a valuable place for Auto ML in data science workflows.

Before using Auto ML or encouraging your team to use Auto ML, you need to answer two questions.

  1. How should Auto ML fit into data scientists’ workflows?
  2. How do you incorporate Auto ML into your machine learning or MLOps pipeline?

Data scientist workflow

If you think about your data scientists’ workflows, consider whether you will remove a real bottleneck, or simply get to the next bottleneck quicker7? Remember, Auto ML just helps to connect inputs to outputs automatically with a “pretty good” algorithm. Your problem needs to be well defined. Your data scientists need to specify the training data sample, the target variable and the input variables.

When used well, Auto ML can stop data scientists getting bogged down at the model-building step. This step is not the most valuable thing that a good data scientist does—it is possibly the least valuable, as it can be automated.

How should Auto ML fit in with data or feature engineering? I recommend using feature store concepts as part of the workflow, which is also part of Google’s MLOps architecture.


Feature stores play an important part in machine learning operations (MLOps), according to Google.

If your organisation does not have well defined processes for

  • serving features for building training samples for machine learning models,
  • serving features for making predictions from trained machine learning models,
  • defining and making features available for model training and serving,
  • defining the training population and sampling intelligently, or
  • integrating model predictions to operational systems and applications,

then Auto ML will only bring you to these bottlenecks more quickly; your data scientists will have automated only a small part of their workflow.

With these in place, Auto ML can play a valuable role in automating and augmenting the modelling lifecycle.

Automated learning

Business stakeholders often expect machine learning systems to continue to learn and adapt to the world once implemented. Auto ML has a definite role in automating the model-building part of the modelling lifecycle in automated learning—to a point.

Automated learning typically takes an updated training data set, that has been gleaned from newer observations obtained since the model was last trained, to create a new, improved model.

However, you would usually not want—nor expect—the updated version of the model to be radically different to the old model. You just want it to predict better against new data. So—again—the set of algorithms that Auto ML has to choose from should be similar (if not identical) to the previous model version. A sensible automated modelling system should have human oversight for when the automated pipeline produces ‘large’ deviations from the parameters and predictions from previous models.


To me, the best use of Auto ML is to use it to create a “pretty good”, first-cut, draft baseline model. Once produced, this baseline model can be included in the development of the end-to-end pipeline to integrate machine learning predictions into operational systems and applications. I estimate that nine times out of ten, this baseline model will be “good enough” so that there are no measurable improvements to be had.

This baseline model should be

  • S: Simple— easy to built and easy to validate
  • I: Interpretable—why it does what it does
  • E: Explainable—why an individual prediction is made
  • I: Implementable—easy to implement.

To achieve this, of course, you will need to restrict the algorithm and parameter set as much as feasible when using Auto ML to make sure the resulting baseline model achieves these aims.

At this point, the full modelling pipeline can be developed and implemented using the baseline model. You can even start gathering feedback from users and performance measures from using the models outputs in operations. This model will also help to quantify the expected level of performance of the whole pipeline and quickly identify where there are data issues and feature leakage. In a lot of cases: job done.

However, if you think that the model can be improved, take an incremental improvement approach. As data scientists build better versions of the model, swap those models into the pipeline as they become available (after performing your due diligence, of course). This approach will help you balance complexity of models with adding real value. It also encourages deploying models early and getting real-world feedback from users, behaviour from customers and arresting what is often a seemingly endless cycle of analysis.

How do I go about Auto ML?

As with a lot of machine learning software, you have the choice between free, open-source software and vendor platforms for Auto ML. I’ll start looking at vendor platforms.

Warning: any omissions or inaccuracies here are my fault. Please do not view any of this as a recommendation for or against any of these vendors or products. Assess them independently against your particular requirements. This is general machine learning advice only.

Vendor platforms

DataRobot is the king of the Auto ML platforms, and includes recipes and artefacts to augment data scientists’ workflows. Recognising that the scope of machine learning work extends beyond the building and serving of models, DataRobot has been extending its reach across the machine learning lifecycle in recent years.

H2O.ai has an Auto ML product named Driverless AI, as well as an excellent open-source Auto ML solution8. H2O.ai differentiates its solutions with extensive automated feature engineering and model explainability and interpretability—it has been a pioneer in this area.

dataiku has a strong end-to-end data science workbench that includes an Auto ML component in which you can choose between quick prototypes, interpretable models and high-performance models.

Ailys’ DaVinCI Labs is a newcomer to the space aiming at a simple interface and no-code solution that includes some unique explainability concepts.

A particularly interesting offering is SparkBeyond. It focuses on feature discovery. This may arguably augment data scientists’ workflows more effectively than tools that focus solely on the machine learning algorithm.

No doubt there are others—I would love to hear about your favourite9.

Major cloud providers

The major cloud providers offer an alternative. AWS, Microsoft Azure and Google Cloud all offer their own Auto ML algorithms (here, here and here respectively). I would also include Databricks in this set (Databricks AutoML).

These Auto ML products are usually free to use—provided you are using them on the infrastructure of the cloud provider and paying for your compute10.

My view is they are pretty good, particularly when compared with the open-source alternatives I’ll outline below; they should be in your consideration set.

Open source Python libraries

On the open-source side, development of useful Auto ML packages has lagged well behind the vendor offerings. But there are always more being developed it would seem. I touch on the main ones I am aware of below.

Hyperopt_sklearn uses a range of search algorithms to find the best pre-processing, classifier and its hyperparameters. Pre-processing is simply a selection of transformers from scikit-learn.

Auto-sklearn is an automated machine library that works with scikit-learn to replace classifier selection and hyperparameter tuning.

TPOT stands for Tree-based Pipeline Optimization Tool and is positioned by its authors as a ‘data science assistant’. It covers feature selection, feature pre-processing, feature construction, model selection and hyperparameter optimisation. It is built on top of scikit-learn and outputs Python code for a machine learning training pipeline.

PyCaret aims to speed up the whole machine learning experiment flow. It is a wrapper around several Python machine learning libraries including scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray and others. It is aimed at citizen data scientists11.

ML Box uses distributed data pre-processing, leak detection and hyperparameter optimisation. It produces models that explain the predictions they produce. However, it is more a tool to help a data scientist configure the search space than an Auto ML tool.


We will use some Auto ML techniques to build a model on the water pump data set DrivenData. Our aim is to build a “pretty good” baseline model that is simple, interpretable, explainable and implementable; we are trying to predict whether a pump is “non functional” based on knowledge other than observing the pump directly. This type of information could be used to prioritise pump inspection and repairs.

As a comparison, although by no means scientific, I will build an interpretable model using a set of additional features I derived.

My results are below. I have also judged them on the criteria of simple, interpretable, explainable and implementable according to the key shown. In all cases I used defaults of the algorithm (I wanted them to be simple). Although there may be other tools that give the interpretability and explainability, I have judged the tools on their set of features only.



In general, Auto ML was able to produce a model that was better against an unseen test set of data than InterpretML. However, if you do not need the small improvement in predictive power that the Auto ML tools can give, InterpretML was able to build a “pretty good” model with the minimum of fuss; it built a model that met our criteria of being simple, interpretable, explainable and implementable.


Is Auto ML how to get the most from data scientists?

No, but used properly it will help to get more from them.

To get the most, embed data scientists into the organisation so they can learn, understand and help solve real problems. This will get you further than investments in tools.

I can see the future of Auto ML continuing to evolve into feature discovery, and concentrating on augmenting data scientists to help them be more productive.

Select an Auto ML tool that lets your data scientists quickly develop “good enough” models that you can include in your operational pipelines easily.

  1. Even further back than March 2013
  2. Although some judgement is needed to specify which algorithms will meet the specific needs of the problem, this can be set up before setting the Auto ML process in motion. 
  3. Process junkie: a data scientist who believes they are being productive by following a standard process in lieu of thinking about the problem they should be solving. 
  4. Like the kind of problem you see on Kaggle
  5. I must write a guide to spotting data scientists in the wild. 
  6. Recalling one of my rules of thumb: the lower latency needed for a prediction, the more bespoke the algorithm and architecture needed. 
  7. I recall facilitating a trial of a popular and capable Auto ML with three teams of data scientists. The result: they loved the tool but could see they had significant problems and delays in accessing data and putting model predictions into everyday use. The Auto ML tool was not going to dramatically increase the number of models produced by their teams until these additional bottlenecks could be eased. 
  8. In the past I have given talks in Melbourne about this capability in H2O.ai. 
  9. Let me know on LinkedIn
  10. Your organisation’s licensing terms may differ. 
  11. A mythical set of data science users imagined by Gartner