Overview
This vignette describes various scenarios of creating sample datasets
that fit the preferences and needs of the users; and this for the
different models. The functions in the package PINstimation use two
types of datasets: (1) A sequence of daily buys and sells (2) A
high-frequency trading data. This is the reason why only two sample
datasets are preloaded with the package, namely
dailytrades
, and hfdata
. The users can also
generate simulation data using the function
generatedata_mpin()
for PIN, and MPIN models; and the
function generatedata_adjpin()
for the ADJPIN model. Below
we provide some scenarios of creating sample datasets both for the PIN,
MPIN, and ADJPIN models.
Sample datasets for the PIN model
The PIN model is an multilayer PIN model with a single information
layer. We can, therefore, use the function
generatedata_mpin()
, in order to generate sample data for
the PIN model. Generically, this is done as follows:
generatedata_mpin(..., layers=1)
If the user would like to create a sample dataset for infrequently
traded stock, she can specify low values or ranges for the trade
intensity rates. For instance, let’s assume that the user suspects that
an infrequently-traded stock has an average of uninformed trading
intensity for buys and sells between 300
and
500
. They generate a single sample dataset for this
scenario as follows:
pindata <- generatedata_mpin(layers=1, ranges = list(eps.b=c(300, 500), eps.s=c(300,500)), verbose = FALSE)
The details of the generated sample dataset can be displayed with the following code
show(pindata)
## ----------------------------------
## Data series successfully generated
## ----------------------------------
## Simulation model : MPIN model
## Number of layers : 1 layer(s)
## Number of trading days : 60 days
## ----------------------------------
## Type object@data to get the simulated data
##
## Data simulation
##
## =========== ============== ============ =============
## Variables Theoretical. Empirical. Aggregates.
## =========== ============== ============ =============
## alpha 0.571689 0.6 0.6
## delta 0.157208 0.111111 0.111111
## mu 227 229.1 229.1
## eps.b 400 405.57 405.57
## eps.s 393 390.29 390.29
## ----
## Likelihood - (595.083) (595.083)
## mpin - 0.147281 0.147281
## =========== ============== ============ =============
##
## -------
## Running time: 0.007 seconds
You access the sequences of buys, and sells through the slot
@data
of the object pindata
.
show(pindata@data[1:10, ])
## b s
## 1 408 401
## 2 408 395
## 3 651 393
## 4 655 404
## 5 657 387
## 6 627 360
## 7 645 370
## 8 635 406
## 9 631 395
## 10 661 387
You can, now use the dataset object pindata
to check the
accuracy of the different estimation functions. You can do that by
comparing the actual parameters of the sample datasets to the estimated
parameters of the estimation functions. Let us start with displaying the
actual parameters of the sample datasets. These can be accessed through
the slot @empiricals
of the dataset object, which stores
the empirical parameters computed from the sequences of buys and sells
generated. Please refer to the documentation of
generatedata_mpin()
for more information.
actual <- unlist(pindata@empiricals)
show(actual)
## alpha delta mu eps.b eps.s
## 0.6000000 0.1111111 229.0992063 405.5714286 390.2857143
Estimate the PIN model using the function pin_ea()
, and
display the estimated parameters
model <- pin_ea(data=pindata@data, verbose = FALSE)
estimates <- model@parameters
show(estimates)
## alpha delta mu eps.b eps.s
## 0.5999982 0.1111100 229.2150453 405.4372461 390.3501849
Now calculate the absolute errors of the estimation method.
errors <- abs(actual - estimates)
show(errors)
## alpha delta mu eps.b eps.s
## 1.787672e-06 1.087828e-06 1.158390e-01 1.341824e-01 6.447062e-02
Sample datasets for the MPIN model
In contrast to the PIN model, the number of information layers is
free. We can, therefore, use the function
generatedata_mpin()
with the desired number of information
layers, in order to generate sample data for the MPIN model. We can also
skip specifying the number of layers, and the default setting will be
used: the number of layers will be randomly selected from the integer
set from 1
to 5
. Generically, this is done as
follows:
generatedata_mpin(...)
If the user would like to create a sample dataset for frequently
traded stock with two information layers, she can set the argument
layers to 2, and specify high values or ranges for the trade intensity
rates. For instance, let’s assume that the user suspects that a
frequently-traded stock has an average of uninformed trading intensity
for buys and sells between 12000
and 15000
.
They generate a single sample dataset for this scenario as follows:
mpindata <- generatedata_mpin(layers=2, ranges = list(eps.b=c(12000, 15000), eps.s=c(12000,15000)), verbose = FALSE)
The details of the generated sample dataset can be displayed with the following code
show(mpindata)
## ----------------------------------
## Data series successfully generated
## ----------------------------------
## Simulation model : MPIN model
## Number of layers : 2 layer(s)
## Number of trading days : 60 days
## ----------------------------------
## Type object@data to get the simulated data
##
## Data simulation
##
## =========== ================== ================== =============
## Variables Theoretical. Empirical. Aggregates.
## =========== ================== ================== =============
## alpha 0.399589, 0.174923 0.400000, 0.133333 0.533333
## delta 0.843870, 0.729761 0.708333, 0.500000 0.65625
## mu 1030, 2364 1059.43, 2416.32 1398.65
## eps.b 14248 14244.04 14244.04
## eps.s 13001 12976.08 12976.08
## ----
## Likelihood - (824.955) (824.955)
## mpin - 0.026673 0.026673
## =========== ================== ================== =============
##
## -------
## Running time: 0.004 seconds
You access the sequences of buys, and sells through the slot
@data
of the object pindata
.
show(mpindata@data[1:10, ])
## b s
## 1 14263 13009
## 2 14148 13243
## 3 14542 12832
## 4 14315 12927
## 5 14115 14073
## 6 16505 13075
## 7 14230 13120
## 8 14175 12775
## 9 14017 13029
## 10 14076 14197
You can, now use the dataset object mpindata
to check
the accuracy of the different estimation functions, namely
mpin_ml()
, and mpin_ecm()
. You can do that by
comparing the empirical PIN value derived from the sample dataset to the
estimated PIN value of the estimation functions. Let us start with
displaying the empirical PIN value obtained from the sample dataset.
This value can be accessed through the slot @emp.pin
of the
dataset object, which stores the empirical PIN value computed from the
sequences of buys and sells generated. Please refer to the documentation
of generatedata_mpin()
for more information.
actualmpin <- unlist(mpindata@emp.pin)
show(actualmpin)
## MPIN
## 0.02667336
Estimate the MPIN model using the functions mpin_ml()
,
and mpin_ecm
, and display the estimated MPIN values.
model_ml <- mpin_ml(data=mpindata@data, verbose = FALSE)
model_ecm <- mpin_ecm(data=mpindata@data, verbose = FALSE)
mlmpin <- model_ml@mpin
ecmpin <- model_ecm@mpin
estimates <- setNames(c(mlmpin, ecmpin), c("ML", "ECM"))
show(estimates)
## ML ECM
## 0.02667643 0.02667439
Now calculate the absolute errors of both estimation methods.
errors <- abs(actualmpin - estimates)
show(errors)
## ML ECM
## 3.076293e-06 1.033311e-06
The function generatedata_mpin()
can generate a
data.series
object that contains a collection of
dataset
objects. For instance, the user can generate a
collection of 10 datasets, whose data sequences span 60 days, and
contain 3 layers, and use it to check the accuracy of the MPIN
estimation.
size <- 10
collection <- generatedata_mpin(series = size, layers = 3, verbose = FALSE)
show(collection)
## ----------------------------------
## Simulated data successfully generated
## ----------------------------------
## Simulation model : MPIN model
## Number of layers : 3 layer(s)
## Number of datasets : 10 datasets
## Number of trading days : 60 days
## ----------------------------------
## Type object@datasets to access the list of dataset objects
##
## Data simulation
##
## -------
## Running time: 0.043 seconds
accuracy <- devmpin <- 0
for (i in 1:size) {
sdata <- collection@datasets[[i]]
model <- mpin_ml(sdata@data, xtraclusters = 3, verbose=FALSE)
accuracy <- accuracy + (sdata@layers == model@layers)
devmpin <- devmpin + abs(sdata@emp.pin - model@mpin)
}
cat('The accuracy of layer detection: ', paste0(accuracy*(100/size),"%.\n"), sep="")
## The accuracy of layer detection: 100%.
cat('The average error in MPIN estimates: ', devmpin/size, ".\n", sep="")
## The average error in MPIN estimates: 0.001019255.
Sample datasets for the ADJPIN model
The AdjPIN model is an extension of the PIN model that includes the
possibility of liquidity shocks. To obtain a sample dataset distributed
according to the assumptions of the AdjPIN model, users can use the
function generatedata_adjpin()
. Generically, this is done
as follows:
generatedata_adjpin(...)
If the user desires to create 10 sample datasets for frequently
traded stock, they can specify high values or ranges for the trade
intensity rates. For instance, let’s assume that the user suspects that
a frequently-traded stock has an average of uninformed trading intensity
for buys and sells between 10000
and
15000
.
adjpindatasets <- generatedata_adjpin(series = 10, ranges = list(eps.b=c(10000, 15000), eps.s=c(10000,15000)), verbose = FALSE)
The details of the generated sample data series can be displayed with the following code:
show(adjpindatasets)
## ----------------------------------
## Simulated data successfully generated
## ----------------------------------
## Simulation model : AdjPIN model
## Model Restrictions : Unrestricted model
## Number of datasets : 10 datasets
## Number of trading days : 60 days
## ----------------------------------
## Type object@datasets to access the list of dataset objects
##
## Data simulation
##
## -------
## Running time: 0.136 seconds
You access the first dataset from adjpindata
using this
code:
adjpindata <- adjpindatasets@datasets[[1]]
show(adjpindata)
## ----------------------------------
## Data series successfully generated
## ----------------------------------
## Simulation model : AdjPIN model
## Model Restrictions : Unrestricted model
## Number of trading days : 60 days
## ----------------------------------
## Type object@data to get the simulated data
##
## Data simulation
##
## =========== ============== ============
## Variables Theoretical. Empirical.
## =========== ============== ============
## alpha 0.331101 0.333333
## delta 0.367361 0.3
## theta 0.546355 0.55
## theta' 0.573618 0.4
## ----
## eps.b 10608 10645.44
## eps.s 13176 13166.41
## mu.b 59460 59565.31
## mu.s 52968 52994.66
## d.b 3802 3726.63
## d.s 3367 3458.78
## ----
## Likelihood (845.699)
## adjPIN 0.405 0.412
## PSOS 0.085 0.077
## =========== ============== ============
##
## -------
## Running time: 0.012 seconds
You can, now use the dataset object adjpindata
to check
the accuracy of the different estimation functions, namely MLE, and ECM
algorithms. You can do that by comparing the empirical adjpin, and psos
values derived from the sample dataset to the estimated adjpin, and psos
values obtained from the estimation functions. Let us start with
displaying the empirical adjpin, and psos values obtained from the
sample dataset. These values can be accessed through the slot
@emp.pin
of the dataset object, which stores the empirical
adjpin/psos value computed from the sequences of buys and sells
generated. Please refer to the documentation of
generatedata_adjpin()
for more information.
actualpins <- unlist(adjpindata@emp.pin)
show(actualpins)
## adjpin psos
## 0.41195212 0.07709233
Estimate the AdjPIN model using adjpin(method="ML")
, and
adjpin(method="ECM"
, and display the estimated adjpin/psos
values.
model_ml <- adjpin(data=adjpindata@data, method = "ML", verbose = FALSE)
model_ecm <- adjpin(data=adjpindata@data, method = "ECM", verbose = FALSE)
mlpins <- c(model_ml@adjpin, model_ml@psos)
ecmpins <- c(model_ecm@adjpin, model_ecm@psos)
estimates <- rbind(mlpins, ecmpins)
colnames(estimates) <- c("adjpin", "psos")
rownames(estimates) <- c("ML", "ECM")
show(estimates)
## adjpin psos
## ML 0.4126122 0.07672529
## ECM 0.4126176 0.07673268
Now calculate the absolute errors of both estimation methods.
## adjpin psos
## ML 0.0006600537 0.0003670447
## ECM 0.0006655281 0.0003596500
Getting help
If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub.