MATLAB Parallelization vs. Caching: Signal Processing Speedup Comparison for Stipulator
by Simon Sigl
- Do not turn on "everything" (parallelization and cache) to have best performance - you are likely to land suboptimal...
- Think about your use case: repeated calculations → cache; calculations only once → parallel pool
- Measure your own cases.
- Mind the cache size to avoid swapping to disk.
Both parallelization and caching are general techniques aiming at reducing the time needed for computationally intensive tasks. Their implementation in MATLAB comes with different focuses:
- The Parallel Computing Toolbox must be separately bought. It allows parallel execution of computations with the parfor statement. Parallel execution does not come for free, it requires time to start the parallel pool and to coordinate the workers (distribute and collect the tasks and data) which limits the speedup potential. If parallelization actually reduces the time required has to be carefully studied for each application. Furthermore, there are some (sometimes subtle) restrictions to the code when using the parfor statement.
- Memoization is part of MATLAB (no separate Toolbox required) since R2017a. It allows to cache the input/output pairs of time-consuming MATLAB functions and avoids executing the expensive function again when called repeatedly with the same arguments. It imposes little computational overhead but needs RAM for storing input/output data. Restrictions on the function to be memoized do exist but are rather weak (e.g., the function to memoize must not have side-effects or depend on global variables).
Taking advantage of these techniques in Stipulator
Stipulator signal processing allows for both parallelization (if the parallel computing toolbox is installed) and caching (if configured for a signal processing). We did some measurements on a benchmark project file with about 230 cases. Although these results are highly application-specific, they are interesting enough to be shared. Make sure to validate the indicated trends with your own data!
The following can be observed:
- Baseline is 11s (cache and parallelization off).
- The best speedup for repeated calculations (in this file) is reached with caching only (2,5s). This is not zero mainly due to administrative overhead that is necessary besides the actual (cached) signal processing.
Use cases for this configuration: repeated signal processings due to e.g., different plots of the same data, attribute processing, case selection based on processed attributes, group definition based on processed attributes, ...
- Parallel pool measurements only reach a speedup factor of about 2 in this case.
Use cases for this configuration: signal processing only once, e.g., signal extraction directly after opening the file, batch operations
- Activating both parallel pool and caching does not further reduce computation time. This might be related to MATLABs implementation of parallelization: the parallel worker's cache (note that each pool worker has its own cache instance) seems to be invalidated by receiving and unmarshalling the task description.
As a rule of thumb, the best configuration depends on your signal processing use case: repeated (caching) or single-shot (parallelization). Nevertheless, performance measurements for your specific use case are recommended.
How to use
STIPULATOR automatically takes advantage of parallelization when a parallel pool is available. When the Parallel Computing Toolbox is installed, the pool can be started manually by clicking in the lower left corner of MATLAB's main window:
To allow for optimal context-dependent configurations, STIPULATOR offers signal processing-specific cache configuration options. Cache configuration is done in the Edit signal processing dialog.
Note that cache size (Number of cases cached) is a crucial parameter. The performance advantage only plays out when repeated computations with the exact same inputs (case, signal processing, processing level, reference data, perturbation settings, ...) occur. Depending on the exact usage pattern, a different cache configuration might be required. The cache should be at least the number of the typically selected cases. Although, more cache is not always better: setting a too large size might significantly degrade performance as soon as the operating system decides to swap RAM contents to disk.