The Ultimate Guide to Machine Learning Frameworks

Want to get started in machine learning? Here are 8 frameworks to consider: SciKit Learn, Onnx, TEnsorFlow, PaddlePaddle, DL4J and MXnet.

Feb 24th, 2021 10:28am by Janakiram MSV

Featued image for: The Ultimate Guide to Machine Learning Frameworks

We have seen an explosion in developer tools and platforms related to machine learning and artificial intelligence during the last few years. From cloud-based cognitive APIs to libraries to frameworks to pre-trained models, developers make many choices to infuse AI into their applications.

AI engineers and researchers choose a framework to train machine learning models. These frameworks abstract the underlying hardware and software stack to expose a simple API in languages such as Python and R. For example, an ML developer can leverage the parallelism offered by GPUs to accelerate a training job without changing much of the code written for the CPU. These frameworks expose simpler APIs that translate to complex mathematical computations and numerical analysis often needed for training the machine learning models.

Apart from training, the machine learning frameworks simplify inference — the process of utilizing a trained model for performing prediction or classification of live data.

Machine learning frameworks are used in the domains related to computer vision, natural language processing, and time-series predictions. They are also used with structured data typically represented in a tabular form to perform linear regression and logistic regression to predict or classify the data.

This guide aims to introduce mainstream machine learning and deep learning frameworks to developers with an emphasis on their unique characteristics.

Scikit-learn

Scikit-learn is one of the oldest machine learning frameworks developed by David Cournapeau as a Google Summer of Code project in 2007. Available as a Python library, it supports both supervised and unsupervised learning algorithms.

Scikit-learn is built on top of SciPy, an open source scientific toolkit for Python developers. Behind the scenes, SciPy uses NumPy for mathematical calculations, Matplotlib for visualization, Pandas for data manipulation, and SymPy for its algebra capabilities. Scikit-learn extends the SciPy stack through modeling and learning capabilities.

If you are a beginner in machine learning, look no further. Scikit-learn is the best framework for Python developers to learn the foundations of machine learning. This toolkit makes it easy to implement popular algorithms such as linear regression, logistic regression, K nearest neighbor, support vector machine, random forest, and decision trees. Developers can easily switch between the algorithms by using a different classifier.

Apart from supervised learning, which deals with prediction or classification based on historical data with identified features and labels, Scikit-learn can be used for unsupervised learning. It supports algorithms including Gaussian mixture models, manifold learning, clustering, biclustering, principal component analysis (PCA), and outlier detection.

If you are a beginner in machine learning, look no further. Scikit-learn is the best framework for Python developers to learn the foundations of machine learning.

Scikit-learn is best suited to train models based on structured data represented typically represented in a tabular form. Since it deals only with classical machine learning techniques that don’t use neural networks and deep learning for training, Scikit-learn doesn’t need GPUs. Python developers can quickly get started with Scikit-learn by installing the package.

Even those developers who use TensorFlow and PyTorch for training prefer Scikit-learn for helper functions such as data preprocessing, encoding, cross-validation, and hyperparameter tuning.

TensorFlow

TensorFlow is one of the most popular machine learning and deep learning frameworks used by developers and researchers. Initially launched in 2007 by the Google Brain team, TensorFlow has matured to become an end-to-end machine learning platform. It goes beyond training to support data preparation, feature engineering, and model serving.

TensorFlow can run on standard CPU and specialized AI accelerators, including GPU and TPU. It is available on 64-bit Linux, macOS, Windows, and mobile computing platforms, including Android and iOS. Models trained in TensorFlow can be deployed on desktops, browsers, edge computing devices, and even microcontrollers. This broad support makes TensorFlow unique and production-ready.

The core TensorFlow library can be installed as a Python module running on AMD and ARM platforms. TensorFlow.js is a JavaScript library for training and deploying models in the browser and on Node.js. For mobile, IoT, and edge devices, TensorFlow Lite can be used for model inferencing.

TensorFlow Extended (TFX) is a production-grade platform for implementing ML pipelines spanning the entire workflow required for data acquisition, data ingestion, data validation, model training, model analysis, deployment, and inference. Enterprises use TFX for implementing end-to-end ML projects.

Models trained in TensorFlow can be deployed on desktops, browsers, edge computing devices, and even microcontrollers. This broad support makes TensorFlow unique and production-ready.

In TensorFlow 1.x, code was written to define the computational graph followed by the execution. This was cumbersome and time-consuming for developers implementing neural networks. The launch of TensorFlow 2.0 brought two significant changes – eager execution and integration with Keras. Eager execution made it possible to execute code without defining the computational graph, which makes the development and the debug process simpler. Keras, the high-level machine learning API, is natively integrated with TensorFlow 2.0, bringing the familiar workflow of defining the neural network and training it.

TensorFlow Hub is a collection of pre-trained models that developers can use for inference across different environments, including cloud, desktop, browser, and edge.

TFServe is a component of the TensorFlow platform, a flexible, high-performance serving system for machine learning models, designed for production environments. It can serve multiple models in a standard format to deliver a highly scalable inference service.

Whether you are developing computer vision, natural language processing, or time-series models, TensorFlow is a mature and robust machine learning platform with end-to-end capabilities.

PyTorch

PyTorch is an open source deep learning framework built to be flexible and modular for research, with the stability and support needed for production deployment. It is based on Torch, a framework for performing fast computations originally written in C.

Compared to other deep learning frameworks, PyTorch has a shorter learning curve for Python developers. Developed at Facebook AI and Research lab (FAIR), it got developers and researchers’ mindshare. Although the Python interface is considered more polished and the primary language of development, PyTorch also has a C++ interface. Facebook invested in another framework called Convolutional Architecture for Fast Feature Embedding (Caffe2), which got merged with PyTorch in 2018.

Since its first version, PyTorch implemented eager execution, which inspired TensorFlow 2.0. One of the advantages of PyTorch is its compatibility with NumPy. Converting NumPy objects to tensors is natively integrated with PyTorch’s core data structures. Developers can easily switch back and forth between PyTorch tensor objects and NumPy arrays.

Implementing a neural network in PyTorch is simpler and intuitive than other frameworks.

Implementing a neural network in PyTorch is simpler and intuitive than other frameworks. With its support for CPUs and GPUs, complex deep neural networks can be trained with large datasets.

Facebook and Amazon Web Services (AWS) have collaborated to develop TorchServe, an open source inferencing engine for PyTorch models. TorchServe delivers lightweight serving with low latency to provide high-performance inference service. It includes default handlers for the most common applications, such as object detection and text classification, so developers don’t have to write custom code to deploy the models.

TorchElastic is an open source tool for training deep neural networks at scale using Kubernetes. It enables distributed PyTorch training jobs to be executed in a fault-tolerant and elastic manner. The TorchElastic Controller for Kubernetes (TECK) is a native Kubernetes implementation of the PyTorch Elastic interface that automatically manages the lifecycle of the Kubernetes pods and services required for TorchElastic training.

The PyTorch Hub acts as a model zoo to explore pre-trained models for experimentation and transfer learning.

Apache MXNet

Launched in 2017, Apache MXNet is one of the recent entrants into the deep learning ecosystem. Its uniqueness lies in the support for various languages, including C++, Python, Java, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram Language.

Apache MXNet was co-developed by Carlos Guestrin at the University of Washington and researchers from Carnegie Mellon University. It was selected by Amazon as the preferred deep learning framework for building commercial products and managed ML platform offerings of AWS. Today, most of the prebuilt models and algorithms available on Amazon SageMaker are implemented with Apache MXNet.

For Python developers, MXNet provides a comprehensive and flexible API for developers with different levels of experience and wide-ranging requirements. Similar to how Keras provides a developer-friendly, high-level API for TensorFlow, Apache MXNet exposes Gluon API, which provides a clean, simple API for deep learning. Gluon has specialized APIs, GluonCV, GluonNLP, and GluonTS meant for computer vision, natural language processing, and time-series analysis.

For Python developers, MXNet provides a comprehensive and flexible API for developers with different levels of experience and wide-ranging requirements.

Apache MXNet can target both CPUs and GPUs for training and inference. When used in the cloud environment, it takes advantage of the scalable GPU infrastructure offered by Amazon EC2. It is tightly integrated with Horovod, the distributed deep learning toolkit by Uber, to support highly distributed GPU training.

The open source multi-model server can be used for serving the models trained with Apache MXNet.

The community support for Apache MXNet is growing. The D2L.ai project offers comprehensive learning resources and interactive Jupyter Notebooks for samples.

Eclipse Deeplearning4j

Deeplearning4j is one of the few machine learning frameworks natively written in Java targeting the Java Virtual Machine (JVM). It is developed by a group of ML developers based in San Francisco and supported commercially by the startup Skymind. Deeplearning4j was donated to the Eclipse Foundation in October 2017. The library is compatible with Clojure and Scala.

For clustering and distributed training, Deeplearning4j is integrated with Apache Spark and Apache Hadoop. It is also integrated with NVIDIA CUDA runtime to perform GPU operations and distributed training across multiple GPUs.

Deeplearning4j includes an n-dimensional array class using ND4J that allows scientific computing in Java and Scala, comparable to the functions that NumPy provides to Python. It can be effectively used as a library for performing linear algebra and matrix manipulation for training and inference.

Deeplearning4j can be used for training models that can perform image classification, object detection, image segmentation, natural language processing, and time-series predictions.

Deeplearning4j is one of the few machine learning frameworks natively written in Java targeting the Java Virtual Machine.

The free developer edition of SKIL, the Skymind Intelligence Layer, can serve Deeplearning4j models. The SKIL model server can also import models from Python frameworks such as Tensorflow, Keras, Theano, and CNTK.

Though Deeplearning4j is relatively less popular than TensorFlow and PyTorch, it is gaining traction among Java developers.

XGBoost

XGBoost is the preferred library for training models based on linear regression and classification. XGBoost stands for eXtreme Gradient Boosting, a technique often used in supervised learning when training models based on highly structured datasets. Tianqi Chen created it at Washington University in May 2014. Interestingly, he is also the co-creator of Apache MXNet.

XGBoost is written in C++ with interfaces for Python, R, Julia, and Java. It is integrated with Scikit-learn as one of the supported model algorithms. XGBoost is based on a technique of adding new models that predict the residuals or errors of prior models and then added together to make the final prediction. It uses the gradient descent algorithm to minimize the loss when adding new models.

XGBoost shines when training models based on smaller datasets with missing data.

While deep learning frameworks deliver better results when training computer vision, natural language processing, and time-series models, they are an overkill for training datasets with linear separability. XGBoost shines when training models based on smaller datasets with missing data. It implicitly performs a technique called data imputation, which makes up for the missing values in the dataset.

XGBoost can be trained on less expensive multicore CPUs instead of a distributed GPU cluster. When there is a limited amount of data, XGBoost comes across as the most efficient and affordable ML library to train accurate models.

When the data is highly structured, representing a one-dimension array, the combination of Pandas and XGBoost is used to train the models. Images and videos that translate to multi-dimensional arrays or tensors, work best with a deep learning framework such as TensorFlow and PyTorch.

When you have a structured, tabular, one-dimensional dataset with missing values, and you need unmatched execution speed on CPUs with high prediction accuracy, go for XGBoost. It won’t disappoint you.

Paddle Paddle

PaddlePaddle (PArallel Distributed Deep LEarning) is an independent, open source deep learning platform launched by Baidu in 2016. It’s an easy-to-use, efficient, flexible, and scalable deep learning platform. Scientists and engineers originally developed it to apply deep learning to many products at Baidu, such as NLP, translation, and computer vision.

PaddlePaddle supports a wide range of neural network architectures and optimization algorithms. With PaddlePaddle, it is possible to leverage many CPUs/GPUs and machines to speed up training, achieving high throughput and performance via optimized communication.

PaddlePaddle offers 146 algorithms and has advanced more than 200 pretraining models, some of them with open-source codes to facilitate the rapid development of industrial applications. The platform also hosts toolkits for cutting-edge research purposes, like Paddle Quantum for quantum-computing models and Paddle Graph Learning for graph-learning models.

Python developers can easily get started with the framework by installing the module.

Open Neural Networx Exchange (ONNX)

The Open Neural Networx Exchange (ONNX) project was initiated by AWS, Facebook, and Microsoft in 2017. In 2019, it was accepted as a graduate project in Linux Foundation AI (LFAI).

Though ONNX is not an end-to-end framework such as TensorFlow or PyTorch, it deserves the attention of ML engineers and operators. ONNX brings interoperability to models trained in various deep learning frameworks. For example, a model trained in PyTorch can be exported to ONNX, which can be imported into TensorFlow for inference.

ONNX has three components — The backend layer optimized for AI accelerator software such as Intel OpenVINO Toolkit and NVIDIA TensorRT; a runtime that can perform inference of ONNX models; a set of tools to export and import models from one form to another.

For a detailed overview of ONNX, refer to my analysis and tutorial series published at The New Stack.

Microsoft has open-sourced ONNX Runtime and added support for training. This capability turned ONNX into a complete platform that can train models and perform inference.

ONNX brings interoperability to models trained in various deep learning frameworks.

ONNX is available as a Python library for developers. Once installed, it’s easy to export models from one format to another. PyTorch and Apache MXNet have native tools to export models into the ONNX format. The ONNX Model Zoo has pre-trained models from the vision, language, and speech domains.

ONNX attempts to reduce the fragmentation of deep learning frameworks through interoperability. It is undoubtedly a step in the right direction that will positively impact the AI ecosystem.

TNS owner Insight Partners is an investor in: The New Stack.