Deep Learning in Matlab
by Daniel Lämmerhofer
Nowadays in context of machine learning and artificial intelligence often the term Deep Learning (DL) is propageted. The meaning of DL is not clearly defined - however (very) large and deep (neural) networks are normally hidden behind the buzzword. Accordingly DL is not completely new, but due to faster and better computer hardware it is possible to train large models with a huge amout of training data (keyword Big Data) in adequate time. Additionally various algorithms and methods in this field are continually being improved and extended.
Some key aspects experts mention related to DL are:
- Usage of special architectures like multilayer neural networks and convolutional neural networks
- Backpropagation algorithm
- Usage of various layer types (e.g. SoftMax, Dropout, Pooling)
- Implicit feature extraction
- Usage of special activation functions between the layers (e.g. poslin)
- Usage of hardware resources like GPUs
DL networks consist of several (different) non-linear processing layers. Such architectures performing (very) well in applications like speech recognition, object detection in videos, image classification as well as other classification or regression problems.
An important dimension to distinguish between shallow and deep learning is the depth of a network. In simple words that is the amount of connections between input and output. In case of traditional neural networks the depth is roughly equal to the number of hidden layers plus the output layer. According to experts at least a depth of 3 is necessary to call it DL. Starting with a depth of 10 it is called "very deep" learning - in practice usually 10-20 layers are used (recommended), because i.a. this model size can be training performantly.
What does MATLAB offer in context of Deep Learning?
For decades MathWorks® is one of the first providers of a professional software package to develop and apply artificial neural nets by means of the Neural Network Toolbox™. In addition to known software libraries and frameworks like TensorFlow™, Caffe or Torch, also MathWorks® provides a powerful solution for DL with many methods in the latest MATLAB releases. Particularly notable are two popular DL architectures that are extensively integrated in the Neural Network Toolbox™: i.e. so called Autoencoders as well as Stacked Autoencoders (SAE) since version R2015b and Convolutional Neural Networks (CNNs) since R2016a. For both DL approaches some classes, methods and functions to configure, train and evaluate deep networks are available.
Among other use cases by means of SEAs featues can directly from raw data be trained, extracted and re-used. This is a major difference to the conventional machine learning process where feature engineering normally must be done by experts manually. CNNs are known architectures that are meanwhile primarily state-of-the-art in image recognition. In this case raw RGB data are enough as input of a configured network to achieve appropraite detection resp. classification percormances. For more information about topics like object detection in images and videos we refer at this point to the Computer Vision System Toolbox™. CNNs can also be used in application areas such as speech recognition or natural language processing.
Using CNNs the two following approaches due to processing time and available training data are differentiated:
|#Training data||Processing time||Training time||Performance|
|Train own CNN||~ 1.000 - 1.000.000||Very high (GPU required)||Days (up to weeks)||Very good (risk of overfitting if too little trainings data)|
|Use pre-trained CNN||~ 100 - 1.000||Medium (GPU optional)||Seconds up to minutes||Good (depending on the pre-trained CNN)|
Because the training of CNNs can take a very long time it is often meaningful to use a pre-trained network for the given problem. These networks can be applied for two different use cases:
- Feature Extraction: use the pre-trained CNN to extract features from data and then use those features to train a different classifier (e.g. support vector machine)
- Transfer Learning: take a network trained on a large dataset and retrain the last few layers on a smaller data set
For image recognition several pre-trained CNNs can be found in the internet. An example for this is called AlexNet and can be downloaded here or since version R2016b of the Neural Network Toolbox™ an own package with this network is available (more information here). This model was trained with images from the known ImageNet data base. It comprises approximately 1000 categories with around 1000 example images each.
In the following table the opportunities resp. the available MATLAB classes and functions for the described architectures as well as other methods and aspects in context of deep learning are summarized:
|Issue||Class/Function name||Description||since version||Toolbox|
Configure, train, evaluate
Class that i.a. contains the trained network, training's parameter as well as transfer functions for en- and decoder
(Unsupervised) training of a autoencoder network; can be extensively parametrized (i.a. number of neurons in den hidden layers, transfer functions for en- and decoder, coefficient for L2 regularization of the weights, training algorithm or loss function)
Training of a so called softmax layer network for classification. For example the ouput of an autoencoder can be used as input features
This function stacks several autoencoders and optional a net for classification (e.g. created by trainSoftmaxLayer) at the end together. With the common train function the stacked resp. deep network can be fine-tuned
Mapping of the input data to the representation of the hidden layer (size resp. dimension depending on the training parameter
Transfer data from the hidden layer back to the original input data of the net
Create resp. configure
For the configuration and construction of any deep CNNs various kinds of layers are available. Each layer type can be configured individually - here a list of important classes: ImageInputLayer, Convolution2DLayer, ReLULayer (equal to the well-known poslin activation function), AveragePooling2DLayer, MaxPooling2DLayer, FullyConnectedLayer, DropoutLayer (further details see Dropout), SoftmaxLayer (i.a. for classification problems, see also trainSoftmaxLayer), ClassificationOutputLayer
Configure and train a
Function to define the training parameters for a neural net such as the algorithm (e.g. Stochastic Gradient Descent with Momentum), lern rate, training epochs or L2 regularization factor; related class: TrainingOptionsSGDM
Evaluation of a network (e.g. a CNN resp. SeriesNetwork object) with (new) input data. Returned are the class names and additionally a confidence score of the prediction
This method (only) returns the predicted probability of each class
|Transfer learning resp.
With this method the output of every single layer of a trained CNN (SeriesNetwork) kann be computed. This output can be used as input features for the training of any machine learning model (SVM, decision tree, etc.) with specific training data.
Regularization can be configured in some training funcktion (e.g. for trainAutoencoder the parameter is called
Ensembles already exist longer in MATLAB, mentionable at this point are the two new functions fitcensemble and fitrensemble since version R2016b. They provide a better interface to train classification or regression ensembles.
|Statistics and Machine Learning|
Method to avoid overfitting: during the training process neurons in the hidden layers are randomly set to 0. The amout is parametrizable (default: 50%). A CNN normally contains at least one dropout layers (see class DropoutLayer)
Alternative performance resp. cost function for neural networks; since R2015 also for 2-class classification problems
* ... improves the performance (see further details in the next section)
This software possibilites in MATLAB refer to the current state (that is version R2016b). Because of the current presence and the "speed" of deep learning the opportunities and range of functions will certainly be more and more in future releases.
How does MATLAB deal with the increased performance requirements for Deep Learning?
Depending on depth resp. complexity and amount of data the training and evaluation of a network can take a long time. To ensure an appropriate performance in the last releases some functions in the Neural Network Toolbox™ have been improved.
Here a short listing of the affected functions:
- Since R2015b
- Usage of built-in GPU to accelerate training (trainNetwork) and evaluation (classify, predict, activations) of CNNs
- Since R2016a
- Condition: Parallel Computing Toolbox™ und CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher
- Usage of a CPU as executing hardware environment for evaluations of trained CNNs
- Faster training of CNNs for image recognition, if ImageDatastore object are used
- Since R2016b
Can Deep Learning also be applied in the ANDATA tools?
In the ANDATA tool environment the BRAINER is a graphical user interface for the execution of machine learning tasks, which are especially based on the Neural Network and Statistics and Machine Learning Toolbox™. We always try to integrate the new features and improvements of MATLAB's toolboxes in our tool. Hence some of the above mentioned deep learning aspects are already available in the current BRAINER release - here a short listing:
- Stacked Autoencoders are fully integrated as separate model type: by means of the training parameters GUI for n autoencoders and a softmax layer all training parameters can be configured
- The loss function crossentropy is available for model types Backpropagation and Stacked Autoencoders
- For the modely types Backpropagation and Stacked Autoencoders regularization can be defined through an own parameter field
- For Backpropagation models the poslin resp. ReLU (Rectified Linear Unit) activation resp. transfer function is available for last or hidden layers
- Various ensemble methods are available as separate model types:
- Boosting: different algorithms (e.g. AdaBoost, LogitBoost, RUSBoost) can be selected for the aggregation of single models
- Cluster and Select
- Fusion: several methods for the fusion of single models are implemented - i.e. majority vote, weighting (called Bagging if equal weights), stacking
- Random Forests: training of multiple decision tree (see TreeBagger for details)
- The usage of parallel and/or GPU execution for training and/or evaluations can be activated for the following model types:
- Backpropagation, Generalized Regression and Probabilistic Neural Network, Radial Basis Network
- Condition: Parallel Computing Toolbox™ und CUDA®-enabled NVIDIA® GPU
Our next step is to integrate and support CNNs (available since R2016a) directly in BRAINER as particular model type. Up to now they must be integrated manually as custom models.
Finally it has to be said that deep learning has existed for some time in MATLAB and our tools - i.e. all the while any desired deep neural networks can be configured by the parameter for the amount of hidden layers resp. neurons. Furthermore the possibility to combine single models as committee also exists since some versions. Currently the term deep learning hypes and its methods and architectures achieve more and more considerable successes. This achievements help finally neural nets for a widespread breakthrough. But at the moment in this topic there is such a rapid progress that you have to keep your eyes (nearly) daily opened! Accordingly this blog only builds a snapshot at the end of the year 2016.