New functionalities in BRAINER with version 4.2
by Daniel Lämmerhofer
In addition to some enhancements due to Deep Learning (see blog Deep Learning in Matlab) the BRAINER also provides further features to make solving machine learning problems more flexible and efficient. These new functionalities allow a number of new use cases or studies in BRAINER:
- Training and (comparing) evaluation of different input parameter combinations (feature sets) in one singe project file
- Configuration of diverse input parameter transformations for same input feature sets
- Direct performance comparison between different model types for a given problem (e.g. traditional/analytical models vs. machine learning models)
- Adaption of already trained models with for instance new requirements or training data
- Configuration, training and evaluation of any hybrid models consisting of:
- several models of the same type
- some models with different type
- Comparisons of single models with hybrid models and ensembles with the same evaluation function
- More flexible and extensive parametrisation of model training methods
Additionally the menu structure has been improved and all evaluation plots in Postprocessing were unified. Among other things all plots have now a standardised footer with information on performance and kind of evaluated models. The following sections should briefly describe the new features including some tool screenshots.
Configuration of several input feature sets
The first important change in the new BRAINER version affects the administration of input parameters (Menu: Data > Parameters > Inputs...). Now it is possible to configure several different input parameter combinations and to use and select them as input for model training. So far it was only feasible to define one single input parameter combination in a project file. This new feature enables to compare quickly and easily various input parameter combinations (e.g. default set, features from an optimisation, feature suggestions from experts).
Especially it may be helpful if, for example, there are varied demands on the selection of input parameters for different model types. On the one hand you can use initially all available input parameters to train decision trees. On the other hand for neural nets at first you should select a reduced number of features. In further steps these features can be expanded or replaced systematically with appropriate feature selection methods. Both concepts can no be applied in one BRAINER project.
Configuration of different input transformations
Besides the opportunity to configure multiple input parameter combinations, it is also possible to define for each feature set individually the transformation per parameter (Data > Transformations...).
Thus you can analyse in one BRAINER file the effect on the model performance using varying transformations for same input sets. In particular for model types, where the "right" input transformation is important for the training method and model performance (e.g. artificial neural networks), this enhancement simplifies and accelerates the suitable configuration of transformation parameters.
Diverse model types in one project file
A second major innovation enables that various model types can be parametrised (Models > Training settings...), trained (Models > Training) as well as compared and evaluated (Postprocessing) in a single project file. As a result you can simply and quickly made a comparison e.g. between conventional analytical or rule-based models and machine learning models for classification or regression problems.
Further in this context the training parameters of the different model types from the Matlab toolboxes (i.a. diverse kinds of machine learning models such as neural networks with backpropagation, stacked autoencoders, decision trees or support vector machines) were extended and updated. Also the interface to integrate other (e.g. WEKA) or customised kinds of models into BRAINER has been improved and is now object-oriented. Hence it is much easier to compare and/or combine existing algorithms with machine learning models.
Combination of single models
Based on the possibility to have various types of models in a project file, also the methods to combine trained single models were enlarged. To configure and train committees or hybrid models i.a. the following model types/classes are available in BRAINER:
- FusionEnsemble: several methods for the fusion of single models are already implemented (i.e. majority vote, weighted combination or stacking)
- BoostingEnsemble: all in Matlab available algorithms (e.g. AdaBoost, LogitBoost, RUSBoost) can be selected for the aggregation of single models
- RandomForests: combination and training of multiple decision tree (on the basis of Matlab's TreeBagger class)
Because an ensemble is also an own model, itself can be part of another hybrid model combination. For this reason diverse algorithm modules can be configured, trained and evaluated in one project file. On the one hand these components can include miscellaneous machine learning models, and on the other hand for comparison purposes it can be an ordinary analytical set of rules.
With this new approach of building mixed committees it is also possible to calibrate individual algorithm modules separately and to test and evaluate their effect in the overall algorithm.
Adaption of trained models
Another methodical aspect, that is better integrated in the new version, concerns the adaption of models. For trained models or rather for model types which support adaption (e.g. backpropagation networks or decision trees) corresponding adaption parameters can be configured (Models > Adaption settings...). The adapted model (Models > Adaption...) is stored in the project file instead of the initially trained model.
A typical use case for adaption are new data or changed outputs (requirements). If there are new data it is not necessary that all data blocks (previous and new) are given to the training again, but the adaption is only made with the new data. This is, among others, especially helpful with very large datasets or in self-learning applications.
The modification of outputs is a special trick to improve the (machine) learning results. If you are interested, please contact firstname.lastname@example.org to get more information how that works and in which applications you can get maximum performance out of the actual models with that approach.
Administration and exchange of groups
To simplify analyses and the selection of data blocks there is now the possibility to easily exchange generated groups between the ANDATA tools (Data > Groups). For example, data blocks with a consistently low model performance can be collected in a group to reimport them to STIPULATOR for additional signal analyses.
Therefore also administrative data can be better kept consistent across the entire process chain (forwards and backwards).