Estimates the Adjusted Probability of Informed Trading
(adjPIN
) as well as the Probability of Symmetric Order-flow Shock
(PSOS
) from the AdjPIN
model of Duarte and Young(2009).
Usage
adjpin(data, method = "ECM", initialsets = "GE", num_init = 20,
restricted = list(), ..., verbose = TRUE)
Arguments
- data
A dataframe with 2 variables: the first corresponds to buyer-initiated trades (buys), and the second corresponds to seller-initiated trades (sells).
- method
A character string referring to the method used to estimate the model of Duarte and Young (2009) . It takes one of two values:
"ML"
refers to the standard maximum likelihood estimation, and"ECM"
refers to the expectation-conditional maximization algorithm. The default value is"ECM"
. Details of the ECM method, and comparative results can be found in Ghachem and Ersan (2022a) , and in Ghachem and Ersan (2022b) .- initialsets
It can either be a character string referring to prebuilt algorithms generating initial parameter sets or a dataframe containing custom initial parameter sets. If
initialsets
is a character string, it refers to the method of generation of the initial parameter sets, and takes one of three values:"GE"
,"CL"
, or"RANDOM"
."GE"
refers to initial parameter sets generated by the algorithm of Ersan and Ghachem (2022b) , and implemented ininitials_adjpin()
,"CL"
refers to initial parameter sets generated by the algorithm of Cheng and Lai (2021) , and implemented ininitials_adjpin_cl()
, while"RANDOM"
generates random initial parameter sets as implemented ininitials_adjpin_rnd()
. The default value is"GE"
. Ifinitialsets
is a dataframe, the functionadjpin()
will estimate the AdjPIN model using the provided initial parameter sets.- num_init
An integer specifying the maximum number of initial parameter sets to be used in the estimation. If
initialsets="GE"
, the generation of initial parameter sets will stop when the number of initial parameter sets reachesnum_init
. It can stop earlier if the number of all possible generated initial parameter sets is lower thannum_init
. Ifinitialsets="RANDOM"
, exactlynum_init
initial parameter sets are returned. Ifinitialsets="CL"
: thennum_init
is ignored, and all256
initial parameter sets are used. The default value is20
.[i]
The argumentnum_init
is ignored when the argumentinitialsets
is a dataframe.- restricted
A binary list that allows estimating restricted AdjPIN models by specifying which model parameters are assumed to be equal. It contains one or multiple of the following four elements
{theta, mu, eps, d}
. For instance, Iftheta
is set toTRUE
, then the probability of liquidity shock in no-information days, and in information days is assumed to be the same (\(\theta\)=
\(\theta'\)). If any of the remaining rate elements{mu, eps, d}
is set toTRUE
, (saymu=TRUE
), then the rate is assumed to be the same on the buy side, and on the sell side (\(\mu\)b=
\(\mu\)s). If more than one element is set toTRUE
, then the restrictions are combined. For instance, if the argumentrestricted
is set tolist(theta=TRUE, eps=TRUE, d=TRUE)
, then the restricted AdjPIN model is estimated, where \(\theta\)=
\(\theta'\), \(\epsilon\)b=
\(\epsilon\)s, and \(\Delta\)b=
\(\Delta\)s. If the value of the argumentrestricted
is the empty list (list()
), then all parameters of the model are assumed to be independent, and the unrestricted model is estimated. The default value is the empty listlist()
.- ...
Additional arguments passed on to the function
adjpin()
. The recognized arguments arehyperparams
, andfact
. The argumenthyperparams
consists of a list containing the hyperparameters of theECM
algorithm. When not empty, it contains one or more of the following elements:maxeval
, andtolerance
. It is used only when themethod
argument is set to"ECM"
. The argumentfact
is a binary value that determines which likelihood functional form is used: A factorization of the likelihood function by Ersan and Ghachem (2022b) when it is set toTRUE
, otherwise, the original likelihood function of Duarte and Young (2009) . The default value isTRUE
. More about these arguments are in the Details section.- verbose
A binary variable that determines whether detailed information about the steps of the estimation of the AdjPIN model is displayed. No output is produced when
verbose
is set toFALSE
. The default value isTRUE
.
Details
The argument 'data' should be a numeric dataframe, and contain
at least two variables. Only the first two variables will be considered:
The first variable is assumed to correspond to the total number of
buyer-initiated trades, while the second variable is assumed to
correspond to the total number of seller-initiated trades. Each row or
observation correspond to a trading day. NA
values will be ignored.
If initialsets
is neither a dataframe, nor a character string from the
set {"GE",
"CL",
"RANDOM"}
, the estimation of the AdjPIN
model is
aborted. The default initial parameters ("GE"
) for the estimation
method are generated using a modified hierarchical agglomerative
clustering. For more information, see initials_adjpin()
.
The argument hyperparams
contains the hyperparameters of the ECM
algorithm. It is either empty or contains one or two of the following
elements:
maxeval
: (integer
) It stands for maximum number of iterations of theECM
algorithm for each initial parameter set. When missing,maxeval
takes the default value of100
.tolerance
(numeric
) TheECM
algorithm is stopped when the (relative) change of log-likelihood is smaller than tolerance. When missing,tolerance
takes the default value of0.001
.
References
Cheng T, Lai H (2021).
“Improvements in estimating the probability of informed trading models.”
Quantitative Finance, 21(5), 771-796.
Duarte J, Young L (2009).
“Why is PIN priced?”
Journal of Financial Economics, 91(2), 119--138.
ISSN 0304405X.
Ersan O, Ghachem M (2022b).
“A methodological approach to the computational problems in the estimation of adjusted PIN model.”
Available at SSRN 4117954.
Ghachem M, Ersan O (2022a).
“Estimation of the probability of informed trading models via an expectation-conditional maximization algorithm.”
Available at SSRN 4117952.
Ghachem M, Ersan O (2022b).
“PINstimation: An R package for estimating models of probability of informed trading.”
Available at SSRN 4117946.
Examples
# We use 'generatedata_adjpin()' to generate a S4 object of type 'dataset'
# with 60 observations.
sim_data <- generatedata_adjpin(days = 60)
# The actual dataset of 60 observations is stored in the slot 'data' of the
# S4 object 'sim_data'. Each observation corresponds to a day and contains
# the total number of buyer-initiated transactions ('B') and seller-
# initiated transactions ('S') on that day.
xdata <- sim_data@data
# ------------------------------------------------------------------------ #
# Compare the unrestricted AdjPIN model with various restricted models #
# ------------------------------------------------------------------------ #
# Estimate the unrestricted AdjPIN model using the ECM algorithm (default),
# and show the estimation output
estimate.adjpin.0 <- adjpin(xdata, verbose = FALSE)
show(estimate.adjpin.0)
#> ----------------------------------
#> AdjPIN estimation completed successfully
#> ----------------------------------
#> Likelihood factorization: Ersan and Ghachem (2022b)
#> Estimation Algorithm : Expectation-Conditional Maximization
#> Initial parameter sets : Ersan and Ghachem (2022b)
#> Model Restrictions : Unrestricted model
#> ----------------------------------
#> 20 initial set(s) are used in the estimation
#> Type object@initialsets to see the initial parameter sets used
#>
#> AdjPIN model
#>
#> =========== ==============
#> Variables Estimates
#> =========== ==============
#> alpha 0.10001
#> delta 0.833317
#> theta 0.666667
#> theta' 0.166683
#> ----
#> eps.b 173.67
#> eps.s 173.64
#> mu.b 633.35
#> mu.s 729.15
#> d.b 389.78
#> d.s 347.62
#> ----
#> Likelihood (591.752)
#> adjPIN 0.081667
#> PSOS 0.520659
#> =========== ==============
#>
#> -------
#> Running time: 1.786 seconds
# Estimate the restricted AdjPIN model where mub=mus
# \donttest{
estimate.adjpin.1 <- adjpin(xdata, restricted = list(mu = TRUE),
verbose = FALSE)
# Estimate the restricted AdjPIN model where eps.b=eps.s
estimate.adjpin.2 <- adjpin(xdata, restricted = list(eps = TRUE),
verbose = FALSE)
# Estimate the restricted AdjPIN model where d.b=d.s
estimate.adjpin.3 <- adjpin(xdata, restricted = list(d = TRUE),
verbose = FALSE)
# Compare the different values of adjusted PIN
estimates <- list(estimate.adjpin.0, estimate.adjpin.1,
estimate.adjpin.2, estimate.adjpin.3)
adjpins <- sapply(estimates, function(x) x@adjpin)
psos <- sapply(estimates, function(x) x@psos)
summary <- cbind(adjpins, psos)
rownames(summary) <- c("unrestricted", "same.mu", "same.eps", "same.d")
show(round(summary, 5))
#> adjpins psos
#> unrestricted 0.08167 0.52066
#> same.mu 0.08153 0.52082
#> same.eps 0.08167 0.52061
#> same.d 0.08196 0.52068
# }