[Technical Post] Fast Start Your ML Project Using Auto-ML Libraries in Python

13/5/2021 ● 2 minutes to read

Automated Machine Learning (Auto-ML) is a promising field of study aiming to perform the technical ML pipeline automatically. Specifically, Auto-ML is aiming to automate some critical components such as data engineering, feature engineering, model training, hyperparameter tuning, model monitoring, etc.

In an average ML project, most of the time is spent on the preparation and modeling. To automate the ML pipeline Auto-ML frameworks come into the picture using classifical search methods such as Brute-force or Greedy search.

A summary of the current (13.5.2021) leading Auto-ML open source libraries in Python is shown in the table below:

Library Name Library Description Underline Models Perform hyperparameters tunning? Link
Auto-Sklearn Auto-Sklearn includes some feature engineering techniques such as One-Hot encoding, feature normalization, dimensionality reduction, etc. This library uses Sklearn estimators to process classification and regression problems. All Sklearn ML models Yes Paper Code
TPOT TPOT is an open-source python AutoML tool that optimizes machine learning pipelines using genetic programming. TPOT expects a cleaned dataset, it does feature processing, model selection, and hyperparameter optimization to return the best performing model. All Sklearn ML models Only Numerical Paper Code
Auto-Keras Auto-ML for deep learning (based on the Keras library). The architecture of the NN is obtained by solving a Bayesian optimization problem on some evaluation metric function (such as F1 or accuracy score). Theoreticly, all varients of NN. Implicitly - Yes Paper Code
AutoGluon Auto-ML for deep learning. Unlike other Auto-ML libraries, that only support tabular data, it also supports image classification, object detection, nlp, and real-world applications spanning image. Its architecture search is based on methods such as ASHA, Hyperband, Bayesian Optimization and BOHB - making it the current (2021) state-of-the-art. Theoreticly, all varients of NN. Yes Paper Code

Auto-Sklearn

Auto-Sklearn includes some feature engineering techniques such as One-Hot encoding, feature normalization, dimensionality reduction, etc. This library uses Sklearn estimators to process classification and regression problems.

Underline models: All Sklearn ML models.

Paper Code

TPOT

TPOT is an open-source python AutoML tool that optimizes machine learning pipelines using genetic programming. TPOT expects a cleaned dataset, it does feature processing, model selection, and hyperparameter optimization to return the best performing model.

Underline models: All Sklearn ML models.

Paper Code

Auto-Keras

Auto-ML for deep learning (based on the Keras library). The architecture of the NN is obtained by solving a Bayesian optimization problem on some evaluation metric function (such as F1 or accuracy score).

Underline models: Theoreticly, all varients of NN.

Paper Code

AutoGluon

Auto-ML for deep learning. Unlike other Auto-ML libraries, that only support tabular data, it also supports image classification, object detection, nlp, and real-world applications spanning image. Its architecture search is based on methods such as ASHA, Hyperband, Bayesian Optimization and BOHB - making it the current (2021) state-of-the-art.

Underline models: Theoreticly, all varients of NN.

Paper Code

While Auto-ML is a powerful tool, the current state-of-the-art is far behind the solutions a human-AI developer can obtain due to the lack of creativity, domain knowledge, and other explanations. As a result, using Auto-ML can be a tool to quickly get a good overview of how different models handle a given data, substitute expensive developer's time with machine's time.

Continue Reading