Estimation of adjusted PIN model — adjpin • PINstimation

Estimates the Adjusted Probability of Informed Trading (adjPIN) as well as the Probability of Symmetric Order-flow Shock (PSOS) from the AdjPIN model of Duarte and Young(2009).

Usage

adjpin(data, method = "ECM", initialsets = "GE", num_init = 20,
              restricted = list(), ..., verbose = TRUE)

Arguments

data: A dataframe with 2 variables: the first corresponds to buyer-initiated trades (buys), and the second corresponds to seller-initiated trades (sells).
method: A character string referring to the method used to estimate the model of Duarte and Young (2009) . It takes one of two values: "ML" refers to the standard maximum likelihood estimation, and "ECM" refers to the expectation-conditional maximization algorithm. The default value is "ECM". Details of the ECM method, and comparative results can be found in Ghachem and Ersan (2022a) , and in Ghachem and Ersan (2022b) .
initialsets: It can either be a character string referring to prebuilt algorithms generating initial parameter sets or a dataframe containing custom initial parameter sets. If initialsets is a character string, it refers to the method of generation of the initial parameter sets, and takes one of three values: "GE", "CL", or "RANDOM". "GE" refers to initial parameter sets generated by the algorithm of Ersan and Ghachem (2022b) , and implemented in initials_adjpin(), "CL" refers to initial parameter sets generated by the algorithm of Cheng and Lai (2021) , and implemented in initials_adjpin_cl(), while "RANDOM" generates random initial parameter sets as implemented in initials_adjpin_rnd(). The default value is "GE". If initialsets is a dataframe, the function adjpin() will estimate the AdjPIN model using the provided initial parameter sets.
num_init: An integer specifying the maximum number of initial parameter sets to be used in the estimation. If initialsets="GE", the generation of initial parameter sets will stop when the number of initial parameter sets reaches num_init. It can stop earlier if the number of all possible generated initial parameter sets is lower than num_init. If initialsets="RANDOM", exactly num_init initial parameter sets are returned. If initialsets="CL": then num_init is ignored, and all 256 initial parameter sets are used. The default value is 20. [i] The argument num_init is ignored when the argument initialsets is a dataframe.
restricted: A binary list that allows estimating restricted AdjPIN models by specifying which model parameters are assumed to be equal. It contains one or multiple of the following four elements {theta, mu, eps, d}. For instance, If theta is set to TRUE, then the probability of liquidity shock in no-information days, and in information days is assumed to be the same (\(\theta\)=\(\theta'\)). If any of the remaining rate elements {mu, eps, d} is set to TRUE, (say mu=TRUE), then the rate is assumed to be the same on the buy side, and on the sell side (\(\mu\)_b=\(\mu\)_s). If more than one element is set to TRUE, then the restrictions are combined. For instance, if the argument restricted is set to list(theta=TRUE, eps=TRUE, d=TRUE), then the restricted AdjPIN model is estimated, where \(\theta\)=\(\theta'\), \(\epsilon\)_b=\(\epsilon\)_s, and \(\Delta\)_b=\(\Delta\)_s. If the value of the argument restricted is the empty list (list()), then all parameters of the model are assumed to be independent, and the unrestricted model is estimated. The default value is the empty list list().
...: Additional arguments passed on to the function adjpin(). The recognized arguments are hyperparams, and fact. The argument hyperparams consists of a list containing the hyperparameters of the ECM algorithm. When not empty, it contains one or more of the following elements: maxeval, and tolerance. It is used only when the method argument is set to "ECM". The argument fact is a binary value that determines which likelihood functional form is used: A factorization of the likelihood function by Ersan and Ghachem (2022b) when it is set to TRUE, otherwise, the original likelihood function of Duarte and Young (2009) . The default value is TRUE. More about these arguments are in the Details section.
verbose: A binary variable that determines whether detailed information about the steps of the estimation of the AdjPIN model is displayed. No output is produced when verbose is set to FALSE. The default value is TRUE.

Value

Returns an object of class estimate.adjpin.

Details

The argument 'data' should be a numeric dataframe, and contain at least two variables. Only the first two variables will be considered: The first variable is assumed to correspond to the total number of buyer-initiated trades, while the second variable is assumed to correspond to the total number of seller-initiated trades. Each row or observation correspond to a trading day. NA values will be ignored.

If initialsets is neither a dataframe, nor a character string from the set {"GE", "CL", "RANDOM"}, the estimation of the AdjPIN model is aborted. The default initial parameters ("GE") for the estimation method are generated using a modified hierarchical agglomerative clustering. For more information, see initials_adjpin().

The argument hyperparams contains the hyperparameters of the ECM algorithm. It is either empty or contains one or two of the following elements:

maxeval: (integer) It stands for maximum number of iterations of the ECM algorithm for each initial parameter set. When missing, maxeval takes the default value of 100.
tolerance (numeric) The ECM algorithm is stopped when the (relative) change of log-likelihood is smaller than tolerance. When missing, tolerance takes the default value of 0.001.

References

Cheng T, Lai H (2021). “Improvements in estimating the probability of informed trading models.” Quantitative Finance, 21(5), 771-796.

Duarte J, Young L (2009). “Why is PIN priced?” Journal of Financial Economics, 91(2), 119--138. ISSN 0304405X.

Ersan O, Ghachem M (2022b). “A methodological approach to the computational problems in the estimation of adjusted PIN model.” Available at SSRN 4117954.

Ghachem M, Ersan O (2022a). “Estimation of the probability of informed trading models via an expectation-conditional maximization algorithm.” Available at SSRN 4117952.

Ghachem M, Ersan O (2022b). “PINstimation: An R package for estimating models of probability of informed trading.” Available at SSRN 4117946.

Examples

# We use 'generatedata_adjpin()' to generate a S4 object of type 'dataset'
# with 60 observations.

sim_data <- generatedata_adjpin(days = 60)

# The actual dataset of 60 observations is stored in the slot 'data' of the
# S4 object 'sim_data'. Each observation corresponds to a day and contains
# the total number of buyer-initiated transactions ('B') and seller-
# initiated transactions ('S') on that day.

xdata <- sim_data@data

# ------------------------------------------------------------------------ #
# Compare the unrestricted AdjPIN model with various restricted models     #
# ------------------------------------------------------------------------ #

# Estimate the unrestricted AdjPIN model using the ECM algorithm (default),
# and show the estimation output

estimate.adjpin.0 <- adjpin(xdata, verbose = FALSE)

show(estimate.adjpin.0)
#> ----------------------------------
#> AdjPIN estimation completed successfully
#> ----------------------------------
#> Likelihood factorization: Ersan and Ghachem (2022b)
#> Estimation Algorithm 	: Expectation-Conditional Maximization
#> Initial parameter sets	: Ersan and Ghachem (2022b)
#> Model Restrictions 	: Unrestricted model
#> ----------------------------------
#> 20 initial set(s) are used in the estimation 
#> Type object@initialsets to see the initial parameter sets used
#> 
#>  AdjPIN model  
#> 
#> ===========  ==============
#> Variables    Estimates     
#> ===========  ==============
#> alpha        0.10001       
#> delta        0.833317      
#> theta        0.666667      
#> theta'       0.166683      
#> ----                       
#> eps.b        173.67        
#> eps.s        173.64        
#> mu.b         633.35        
#> mu.s         729.15        
#> d.b          389.78        
#> d.s          347.62        
#> ----                       
#> Likelihood   (591.752)     
#> adjPIN       0.081667      
#> PSOS         0.520659      
#> ===========  ==============
#> 
#> -------
#> Running time: 1.786 seconds

# Estimate the restricted AdjPIN model where mub=mus
# \donttest{
estimate.adjpin.1 <- adjpin(xdata, restricted = list(mu = TRUE),
                                  verbose = FALSE)

# Estimate the restricted AdjPIN model where eps.b=eps.s

estimate.adjpin.2 <- adjpin(xdata, restricted = list(eps = TRUE),
                                  verbose = FALSE)

# Estimate the restricted AdjPIN model where d.b=d.s

estimate.adjpin.3 <- adjpin(xdata, restricted = list(d = TRUE),
                                  verbose = FALSE)

# Compare the different values of adjusted PIN

estimates <- list(estimate.adjpin.0, estimate.adjpin.1,
                  estimate.adjpin.2, estimate.adjpin.3)

adjpins <- sapply(estimates, function(x) x@adjpin)

psos <- sapply(estimates, function(x) x@psos)

summary <- cbind(adjpins, psos)
rownames(summary) <- c("unrestricted", "same.mu", "same.eps", "same.d")

show(round(summary, 5))
#>              adjpins    psos
#> unrestricted 0.08167 0.52066
#> same.mu      0.08153 0.52082
#> same.eps     0.08167 0.52061
#> same.d       0.08196 0.52068
# }