Overview
This vignette describes how to use parallel processing with the different PINstimation functions. It also provides several usage examples on how to activate, and deactivate parallel processing, as well as changing its default options.
A sequential processing is a processing in which one task is completed at a time and all the tasks are run by the processor in a sequence. For example, a sequential processing of the MPIN estimation for the various initial parameter sets entails that the model is estimated for one initial parameter set at a time. The estimation of the model for second initial parameter set is started only after the estimation for the first initial parameter set is completed.
A parallel processing is a processing in which multiple tasks are executed simultaneously and independently by different processors or CPU cores. Note that in parallel processing there is more than one processor/CPU core involved. For example, a parallel processing of the MPIN estimation for the various initial parameter sets entails that the model is estimated for multiple initial parameter sets at the same time. Each processor or CPU core independently estimates the MPIN model for a given initial parameter set.
Parallel processing has the advantage of performing the tasks faster (given a sufficiently large number of tasks). However, it is more costly in terms of CPU power, and memory.
Parallel processing with PINstimation
Parallel processing is available for three functions, typically associated with long running time.
- MPIN model estimation functions
mpin_ml()
,mpin_ecm()
- Data aggregation function:
aggregate_trades()
However, not all calls of these functions can use the parallel processing.
- MPIN model estimation: The use of parallel processing is conditional on the number of the initial parameter sets used for the estimation.
- Data aggregation: Parallel processing is not available when the
argument
timelag
is equal to zero. This entails that no parallel processing is available for the Tick algorithm, as the argumenttimelag
is ignored when the Tick algorithm is used.
Activating, and deactivating parallel processing is done using the
argument is_parallel
available for all these functions. The
default value for this argument is TRUE
for the data
aggregation, and FALSE
for the MPIN model estimation. The
parallel processing depends on two additional options:
- The number of cores used by the functions
- The threshold of initial parameter sets needed to activate parallel processing for MPIN estimations.
Option 1: Number of cores used
The first option is the number of CPU cores used in the parallel
processing. By default, the package uses 2
CPU cores, if
the argument is_parallel
is set to TRUE
. The
option is stored in, and accessed through the R option
pinstimation.parallel.cores
.
To change the number of CPU cores used by PINstimation functions, the
user needs to set the option pinstimation.parallel.cores
to
the desired number of cores. For example, the user can set the number of
cores to 3
using the following code:
options(pinstimation.parallel.cores = 3)
To read the number of cores used by PINstimation functions, the user
can use the function getOption
as follows:
getOption("pinstimation.parallel.cores")
## [1] 3
If the value assigned to the option
pinstimation.parallel.cores
is not valid, either
non-numeric, non-positive, above the available number of cores, or above
the default value; it will automatically set to its default value, i.e.,
2
. However, it will set to this default value only after
one of the functions using parallel processing is called.
## [1] -2
xdata <- hfdata
xdata$volume <- NULL
aggdata <- aggregate_trades(xdata, timelag = 500, algorithm = "LR")
[+] Trade classification started
| [#] Classification algorithm : LR algorithm
| [#] Number of trades in dataset : 100 000 trades
| [#] Time lag of lagged variables : 500 milliseconds
| [1] Computing lagged variables : using parallel processing
|+++++++++++++++++++++++++++++++++++++| 100% of variables computed
| [#] Computed lagged variables : in 4.384 seconds
| [2] Computing aggregated trades : using lagged variables
[+] Trade classification completed
getOption("pinstimation.parallel.cores")
## [1] 2
Option 2: Threshold of initial parameter sets
The second option is the minimum number of initial parameter sets
used in the MPIN estimation, so that parallel processing is activated.
By default, this threshold is set to 100
. Note that
parallel processing will not be used if the number of initial sets is
below the threshold, even if the argument is_parallel
is
set to TRUE
. The option is stored in, and accessed through
the R option pinstimation.parallel.threshold
.
To change the threshold of initial parameter sets for the functions
mpin_ml()
, and mpin_ecm()
, the user needs to
set the option pinstimation.parallel.threshold
to the
desired threshold. The value of the threshold should be an integer. A
negative integer is equivalent to a threshold of zero, and parallel
processing will be used for any number of initial parameter sets, of
course, provided that the argument is_parallel
is set to
TRUE
. If the value assigned to the option
pinstimation.parallel.threshold
is not an integer; it will
automatically be set to its default value, i.e., 100
.
However, it will be set to this default value only after one of the mpin
functions is run with parallel processing.
In order to set the threshold of initial parameter sets to
20
, the user can use the following code:
options("pinstimation.parallel.threshold" = 20)
Setting the threshold to 20
means that parallel
processing will be used only when the number of initial parameter sets
used in the MPIN estimation is equal or exceeds 20
,
otherwise, the standard sequential processing is used. Of course,
parallel processing is only active, if the argument
is_parallel
takes the value TRUE
.
Illustrative Example
Below, we illustrate the interaction between the argument
is_parallel
, and the option
pinstimation.parallel.threshold
by presenting three use
scenarios of the function mpin_ecm
:
Sequential processing
The sequential processing is used when the argument
is_parallel
is set to FALSE
, or is missing
since its default value is FALSE
.
ecm.1 <- mpin_ecm(data = dailytrades, is_parallel = FALSE)
[+] MPIN estimation started
|[1] Computing the range of layers : information layers from 1 to 8
|[2] Computing initial parameter sets : using algorithm of Ersan (2016)
|[=] Selecting initial parameter sets : max 100 initial sets per estimation
|[3] Estimating the MPIN model : Expectation-Conditional Maximization algorithm
|+++++++++++++++++++++++++++++++++++++| 100% of estimation completed [8 layer(s)]
|[3] Selecting the optimal model : using lowest Information Criterion (BIC)
[+] MPIN estimation completed
The output of this estimation is displayed below. Note that the badge
Sequential
is displayed in green, meaning that the
sequential processing has been used.
## ----------------------------------
## MPIN estimation completed successfully
## ----------------------------------
## Likelihood factorization: Ersan (2016)
## Estimation Algorithm : Expectation Conditional Maximization
## Initial parameter sets : Ersan (2016), Ersan and Alici (2016)
## Info. layers detected : using Ghachem and Ersan (2022) [ECM]
## Selection criterion : Bayes Information Criterion (BIC)
## ----------------------------------
## 525 initial set(s) are used for all 8 estimations
## Type object@models for the estimation results for all models.
## Type getSummary(object) for a summary of estimates for all models.
##
## MPIN model Optimal Estimation Sequential
##
## =============== ============================
## Variables Estimates
## =============== ============================
## alpha 0.216667, 0.050000, 0.483333
## delta 0.230769, 0.666667, 0.034483
## mu 602.88, 986.45, 1506.84
## eps.b 336.91
## eps.s 335.89
## ----
## Likelihood (643.458)
## mpin(j) 0.082619, 0.031196, 0.460648
## mpin 0.574463
## ----
## AIC | BIC | AWE 1308.92, 1331.95, 1409.99
## =============== ============================
##
##
## Table: Summary of 8 MPIN estimations by ECM algorithm
##
## BIC AIC AWE layers #Sets time
## --------- ------- ------- ------- ------ ----- ----
## model.1 6473.41 6462.94 6508.88 1 5 0.04
## model.2 1633.51 1616.76 1690.27 2 15 0.27
## model.3 1331.95 1308.92 1409.99 3 35 0.58
## model.4 1331.95 1308.92 1409.99 3 70 1.22
## model.5** 1331.95 1308.92 1409.99 3 100 1.63
## model.6 1342.58 1313.26 1441.9 4 100 5.14
## model.7 1342.58 1313.26 1441.9 4 100 8.54
## model.8 1342.58 1313.26 1441.9 4 100 2.15
##
## -------
## Running time: 19.57 seconds
Parallel processing | number of sets below the threshold
The parallel processing is used when the argument
is_parallel
is set to TRUE
. When the value of
the argument layers
is set to 2
, the number of
the initial parameter sets used is 15
. This number is below
the threshold set above, so the parallel processing is not used, even
though the argument is_parallel
is set to
TRUE
.
ecm.2 <- mpin_ecm(dailytrades, layers = 2, is_parallel = TRUE)
[+] MPIN estimation started
|[1] Using user-selected layers : 2 layer(s) assumed in the data
|[2] Computing initial parameter sets : using algorithm of Ersan (2016)
|[3] Estimating the MPIN model : Expectation-Conditional Maximization algorithm
|+++++++++++++++++++++++++++++++++++++| 100% of estimation completed [2 layer(s)]
[+] MPIN estimation completed
The output of this estimation is displayed below. Note that the badge
Parallel
is displayed in red, meaning that the parallel
processing is activated, but not used.
## ----------------------------------
## MPIN estimation completed successfully
## ----------------------------------
## Likelihood factorization: Ersan (2016)
## Estimation Algorithm : Expectation Conditional Maximization
## Initial parameter sets : Ersan (2016), Ersan and Alici (2016)
## Info. layers in the data: provided by the user
## Selection criterion : Bayes Information Criterion (BIC)
## ----------------------------------
## 15 initial set(s) are used for the 'current' estimation
## Type object@initialsets to see the initial parameter sets used.
##
##
## MPIN model Regular Estimation Parallel
##
## =============== =========================
## Variables Estimates
## =============== =========================
## alpha 0.266667, 0.483333
## delta 0.312500, 0.034483
## mu 677.91, 1512.36
## eps.b 331.07
## eps.s 338.2
## ----
## Likelihood (800.379)
## mpin(j) 0.114341, 0.462343
## mpin 0.576684
## ----
## AIC | BIC | AWE 1616.76, 1633.51, 1690.27
## =============== =========================
##
## -------
## Running time: 0.319 seconds
Parallel processing | number of sets above the threshold
The parallel processing is used when the argument
is_parallel
is set to TRUE
, or is missing
since its default value is TRUE
. When the value of the
argument layers
is set to 3
, the number of the
initial parameter sets used is 35
. This number is above the
threshold set above, so the parallel processing is used.
ecm.3 <- mpin_ecm(dailytrades, layers = 3, is_parallel = TRUE)
[+] MPIN estimation started
|[1] Using user-selected layers : 3 layer(s) assumed in the data
|[2] Computing initial parameter sets : using algorithm of Ersan (2016)
|[3] Estimating the MPIN model : Expectation-Conditional Maximization algorithm
|+++++++++++++++++++++++++++++++++++++| 100% of estimation completed [3 layer(s)]
[+] MPIN estimation completed
The output of this estimation is displayed below. Note that the badge
Parallel
is displayed in green, meaning that the parallel
processing is activated, and used.
## ----------------------------------
## MPIN estimation completed successfully
## ----------------------------------
## Likelihood factorization: Ersan (2016)
## Estimation Algorithm : Expectation Conditional Maximization
## Initial parameter sets : Ersan (2016), Ersan and Alici (2016)
## Info. layers in the data: provided by the user
## Selection criterion : Bayes Information Criterion (BIC)
## ----------------------------------
## 35 initial set(s) are used for the 'current' estimation
## Type object@initialsets to see the initial parameter sets used.
##
##
## MPIN model Regular Estimation Parallel
##
## =============== ============================
## Variables Estimates
## =============== ============================
## alpha 0.216667, 0.050000, 0.483333
## delta 0.230769, 0.666667, 0.034483
## mu 602.84, 986.42, 1506.78
## eps.b 336.92
## eps.s 335.89
## ----
## Likelihood (643.458)
## mpin(j) 0.082614, 0.031196, 0.460638
## mpin 0.574448
## ----
## AIC | BIC | AWE 1308.92, 1331.95, 1409.99
## =============== ============================
##
## -------
## Running time: 2.081 seconds
Getting help
If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub.