Based on the algorithm in Ersan and Alici (2016),
generates initial parameter sets for the maximum likelihood
estimation of the PIN
model.
Arguments
- data
A dataframe with 2 variables: the first corresponds to buyer-initiated trades (buys), and the second corresponds to seller-initiated trades (sells).
- xtraclusters
An integer used to divide trading days into
#(2 + xtraclusters)
clusters, thereby resulting in#comb(1 + xtraclusters, 1)
initial parameter sets in line with Ersan and Alici (2016) . The default value is4
.- verbose
a binary variable that determines whether information messages about the initial parameter sets, including the number of the initial parameter sets generated. No message is shown when
verbose
is set toFALSE
. The default value isTRUE
.
Value
Returns a dataframe of initial sets each consisting of five variables {\(\alpha\), \(\delta\), \(\mu\), \(\epsilon\)b, \(\epsilon\)s}.
Details
The argument 'data' should be a numeric dataframe, and contain
at least two variables. Only the first two variables will be considered:
The first variable is assumed to correspond to the total number of
buyer-initiated trades, while the second variable is assumed to
correspond to the total number of seller-initiated trades. Each row or
observation correspond to a trading day. NA
values will be ignored.
The function initials_pin_ea()
uses a hierarchical agglomerative
clustering (HAC) to find initial parameter sets for
the maximum likelihood estimation. The steps in
Ersan and Alici (2016)
algorithm differ from those
used by Gan et al. (2015)
, and are summarized below.
Via the use of HAC, daily absolute order imbalances (AOIs) are grouped in
2+J
(default J=4
) clusters. After sorting the clusters based on
AOIs, they are combined into two larger groups of days (event and no-event)
by merging neighboring clusters with each other. Consequently, those groups
are formed in #comb(5, 1) = 5
different ways. For each of the 5
configurations with which, days are grouped into two (event group and
no-event group), the procedure below is applied to obtain initial parameter
sets.
Days in the event group (the one with larger mean AOI) are distributed into
two groups, i.e. good-event days (days with positive OI) and bad-event days
(days with negative OI).
Initial parameters are obtained from the frequencies, and average trade
rates of three types of days. See
Ersan and Alici (2016)
for further details.
The higher the number of the additional clusters (xtraclusters
), the
better is the estimation. Ersan and Alici (2016)
,
however, have shown the benefit of increasing this number beyond 4 is
marginal, and statistically insignificant.
References
Ersan O, Alici A (2016).
“An unbiased computation methodology for estimating the probability of informed trading (PIN).”
Journal of International Financial Markets, Institutions and Money, 43, 74--94.
ISSN 10424431.
Gan Q, Wei WC, Johnstone D (2015).
“A faster estimation method for the probability of informed trading using hierarchical agglomerative clustering.”
Quantitative Finance, 15(11), 1805--1821.
Examples
# There is a preloaded quarterly dataset called 'dailytrades' with 60
# observations. Each observation corresponds to a day and contains the
# total number of buyer-initiated trades ('B') and seller-initiated
# trades ('S') on that day. To know more, type ?dailytrades
xdata <- dailytrades
# Obtain a dataframe of initial parameters for the maximum likelihood
# estimation using the algorithm of Ersan and Alici (2016).
init.sets <- initials_pin_ea(xdata)
#> The function initials_pin_ea(...) has generated 5 initial parameter sets.
#>
To display them, either store them in a variable or call (initials_pin_ea(...)).
#>
To hide these messages, set the argument 'verbose' to FALSE.
#>
# Use the obtained dataframe to estimate the PIN model using the function
# pin() with custom initial parameter sets
estimate.1 <- pin(xdata, initialsets = init.sets, verbose = FALSE)
# pin_ea() directly estimates the PIN model using initial parameter sets
# generated using the algorithm of Ersan & Alici (2016).
estimate.2 <- pin_ea(xdata, verbose = FALSE)
# Check that the obtained results are identical
show(estimate.1@parameters)
#> alpha delta mu eps.b eps.s
#> 0.7499975 0.1333342 1193.5179655 357.2659099 328.6291793
show(estimate.2@parameters)
#> alpha delta mu eps.b eps.s
#> 0.7499975 0.1333342 1193.5179655 357.2659099 328.6291793