Estimation of Volume-Synchronized PIN model (vpin) and the improved volume-synchronized PIN model (ivpin)
Source:R/model_vpin.R
vpin_measures.RdEstimates the Volume-Synchronized Probability of Informed
Trading as developed in Easley et al. (2011)
and Easley et al. (2012)
.
Estimates the improved Volume-Synchronized Probability of Informed
Trading as developed in Ke et al. (2017)
.
Usage
vpin(
data,
timebarsize = 60,
buckets = 50,
samplength = 50,
tradinghours = 24,
verbose = TRUE
)
ivpin(
data,
timebarsize = 60,
buckets = 50,
samplength = 50,
tradinghours = 24,
grid_size = 5,
verbose = TRUE
)Arguments
- data
A dataframe with 3 variables:
{timestamp, price, volume}.- timebarsize
An integer referring to the size of timebars in seconds. The default value is
60.- buckets
An integer referring to the number of buckets in a daily average volume. The default value is
50.- samplength
An integer referring to the sample length or the window size used to calculate the
VPINvector. The default value is50.- tradinghours
An integer referring to the length of daily trading sessions in hours. The default value is
24.- verbose
A logical variable that determines whether detailed information about the steps of the estimation of the VPIN (IVPIN) model is displayed. No output is produced when
verboseis set toFALSE. The default value isTRUE.- grid_size
An integer between
1, and20; representing the size of the grid used in the estimation of IVPIN. The default value is5. See more in details.
Value
Returns an object of class estimate.vpin-class, which contains the following slots:
@improvedA logical variable that takes the value
FALSEwhen the classical VPIN model is estimated (usingvpin()), andTRUEwhen the improved VPIN model is estimated (usingivpin()).@bucketdataA data frame created as in Abad and Yague (2012) .
@dailyvpinA data frame with calendar–day aggregates of VPIN. For each trading day, it contains three variables:
day(Date),dvpin(simple daily average of per–bucket VPIN), anddwvpin(duration–weighted daily VPIN, i.e. the weighted average of bucket VPINs with weights proportional to the effective bucket durations).@vpinA vector of VPIN values.
@ivpinA vector of IVPIN values, which remains empty when the function
vpin()is called.
Details
The dataframe data should contain at least three variables. Only the
first three variables will be considered and in the following order
{timestamp, price, volume}.
The argument timebarsize is in seconds enabling the user to implement
shorter than 1 minute intervals. The default value is set to 1 minute
(60 seconds) following Easley et al. (2011, 2012).
The argument tradinghours is used to correct the duration per
bucket if the market trading session does not cover a full day (24 hours).
The duration of a given bucket is the difference between the
timestamp of the last trade endtime and the timestamp of the first trade
stime in the bucket. If the first and last trades in a bucket occur
on different days, and the market trading session is shorter than
24 hours, the bucket's duration will be inflated. For example, if the daily
trading session is 8 hours (tradinghours = 8), and the start time of a
bucket is 2018-10-12 17:06:40 and its end time is
2018-10-13 09:36:00, the straightforward calculation gives a duration
of 59,360 secs. However, this duration includes 16 hours when the
market is closed. The corrected duration considers only the market activity
time: duration = 59,360 - 16 * 3600 = 1,760 secs, approximately
30 minutes.
The argument grid_size determines the size of the grid for the variables
alpha and delta, used to generate the initial parameter sets
that prime the maximum-likelihood estimation step of the
algorithm by Ke et al. (2017)
for estimating
IVPIN. If grid_size is set to a value m, the algorithm creates a
sequence starting from 1 / (2m) and ending at 1 - 1 / (2m), with a
step of 1 / m. The default value of 5 corresponds to the grid size used by
Yan and Zhang (2012)
, where the sequence starts at
0.1 = 1 / (2 * 5) and ends at 0.9 = 1 - 1 / (2 * 5)
with a step of 0.2 = 1 / 5. Increasing the value of grid_size
increases the running time and may marginally improve the accuracy of the
IVPIN estimates
References
Abad D, Yague J (2012).
“From PIN to VPIN: An introduction to order flow toxicity.”
The Spanish Review of Financial Economics, 10(2), 74--83.
Easley D, De Prado MML, Ohara M (2011).
“The microstructure of the \" flash crash\": flow toxicity, liquidity crashes, and the probability of informed trading.”
The Journal of Portfolio Management, 37(2), 118--128.
Easley D, Lopez De Prado MM, OHara M (2012).
“Flow toxicity and liquidity in a high-frequency world.”
Review of Financial Studies, 25(5), 1457--1493.
ISSN 08939454.
Ke W, Lin HW, others (2017).
“An improved version of the volume-synchronized probability of informed trading.”
Critical Finance Review, 6(2), 357--376.
Yan Y, Zhang S (2012).
“An improved estimation method and empirical properties of the probability of informed trading.”
Journal of Banking and Finance, 36(2), 454--467.
ISSN 03784266.
Examples
# The package includes a preloaded dataset called 'hfdata'.
# This dataset is an artificially created high-frequency trading data
# containing 100,000 trades and five variables: 'timestamp', 'price',
# 'volume', 'bid', and 'ask'. For more information, type ?hfdata.
xdata <- hfdata
### Estimation of the VPIN model ###
# \donttest{
# Estimate the VPIN model using the following parameters:
# - timebarsize: 5 minutes (300 seconds)
# - buckets: 50 buckets per average daily volume
# - samplength: 250 for the VPIN calculation
estimate <- vpin(xdata, timebarsize = 300, buckets = 50,
samplength = 250)
#> [+] VPIN Estimation started.
#> |-[1] Checking and preparing the data...
#> |-[2] Creating 300-second timebars...[~ 2 seconds]
#> |-[3] Calculating Volume Bucket Size (VBS) and Sigma(DP)...
#> |-[4] Breaking up large 300-second timebars' volume...
#> |-[5] Assigning 300-second timebars into buckets...
#> |-[6] Balancing timebars and adjusting bucket sizes to VBS...
#> |-[7] Calculating aggregate bucket data...
#> |-[8] Calculating VPIN vector...
#> [+] VPIN estimation completed
# Display a description of the VPIN estimate
show(estimate)
#> ----------------------------------
#> VPIN estimation completed successfully
#> ----------------------------------
#> Type object@vpin to access the VPIN vector.
#> Type object@bucketdata to access data used to construct the VPIN vector.
#> Type object@dailyvpin to access the daily VPIN vectors.
#>
#> VPIN model
#>
#> Table:
[+] VPIN descriptive statistics
#>
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> ------ --------- -------- ------ --------- ------ ------
#> 0.21 0.25 0.29 0.3 0.33 0.46 249
#>
#>
#> Table:
[+] VPIN parameters
#>
#> tbSize buckets samplength VBS ndays
#> -------- --------- ------------ ---------- -------
#> 300 50 250 3256.601 86
#>
#> -------
#> Running time: 3.994 seconds
# Display the parameters of the VPIN estimates
show(estimate@parameters)
#> tbSize buckets samplength VBS ndays
#> 300.000 50.000 250.000 3256.601 86.000
# Display the summary statistics of the VPIN vector
summary(estimate@vpin)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.2060 0.2545 0.2882 0.3017 0.3276 0.4582 249
# Store the computed data of the different buckets in a dataframe 'buckets'
# and display the first 10 rows of the dataframe.
buckets <- estimate@bucketdata
show(head(buckets, 10))
#> bucket agg.bvol agg.svol duration aoi starttime
#> 1 1 1524.107 1732.4944533 2086.41624 208.3875 2018-10-18 00:21:33
#> 2 2 1433.432 1823.1698325 1892.43525 389.7383 2018-10-18 01:01:19
#> 3 3 1515.600 1741.0010405 1189.37824 225.4007 2018-10-18 01:47:51
#> 4 4 2014.050 1242.5512986 1589.64648 771.4988 2018-10-18 02:12:41
#> 5 5 1923.979 1332.6219329 3030.19742 591.3575 2018-10-18 02:57:47
#> 6 6 2034.315 1222.2866525 1630.67501 812.0281 2018-10-18 03:54:41
#> 7 7 3256.465 0.1363789 56.24204 3256.3286 2018-10-18 04:32:10
#> 8 8 3256.465 0.1363789 56.24204 3256.3286 2018-10-18 04:33:44
#> 9 9 3256.465 0.1363789 56.24204 3256.3286 2018-10-18 04:34:40
#> 10 10 3256.465 0.1363789 56.24204 3256.3286 2018-10-18 04:35:36
#> endtime vpin bduration
#> 1 2018-10-18 01:01:19 NA 2386.41624
#> 2 2018-10-18 01:47:51 NA 2792.43525
#> 3 2018-10-18 02:12:41 NA 1489.37824
#> 4 2018-10-18 02:57:47 NA 2706.27185
#> 5 2018-10-18 03:54:41 NA 3413.57205
#> 6 2018-10-18 04:32:10 NA 2249.46482
#> 7 2018-10-18 04:33:44 NA 93.69426
#> 8 2018-10-18 04:34:40 NA 56.24204
#> 9 2018-10-18 04:35:36 NA 56.24204
#> 10 2018-10-18 04:36:32 NA 56.24204
# Display the first 10 rows of the dataframe containing daily vpin values.
dayvpin <- estimate@dailyvpin
show(head(dayvpin, 10))
#> day dvpin dvpin_weighted
#> 1 2018-10-20 0.2842308 0.2786684
#> 2 2018-10-22 0.2465009 0.2469752
#> 3 2018-10-23 0.2444503 0.2442202
#> 4 2018-10-24 0.2302282 0.2290498
#> 5 2018-10-25 0.2272783 0.2267269
#> 6 2018-10-26 0.2167936 0.2155163
#> 7 2018-10-27 0.2421807 0.2419556
#> 8 2018-10-31 0.2434410 0.2425472
#> 9 2018-11-01 0.2545901 0.2527842
#> 10 2018-11-07 0.2589880 0.2613169
### Estimation of the IVPIN model ###
# Estimate the IVPIN model using the same parameters as above.
# The grid_size parameter is unspecified and will default to 5.
iestimate <- ivpin(xdata[1:50000,], timebarsize = 300, samplength = 50, verbose = FALSE)
# Display the summary statistics of the IVPIN vector
summary(iestimate@ivpin)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.00995 0.27456 0.30929 0.31937 0.36709 0.60363 49
# The output of ivpin() also contains the VPIN vector in the @vpin slot.
# Plot the VPIN and IVPIN vectors in the same plot using the iestimate object.
# Define the range for the VPIN and IVPIN vectors, removing NAs.
vpin_range <- range(c(iestimate@vpin, iestimate@ivpin), na.rm = TRUE)
# Plot the VPIN vector in blue
plot(iestimate@vpin, type = "l", col = "blue", ylim = vpin_range,
ylab = "VPIN/iVPIN", xlab = "Bucket", main = "Plot of VPIN and IVPIN")
# Add the IVPIN vector in red
lines(iestimate@ivpin, type = "l", col = "red")
# Add a legend to the plot
legend("topright", legend = c("VPIN", "IVPIN"), col = c("blue", "red"),
lty = 1,
cex = 0.6, # Adjust the text size
x.intersp = 1.2, # Adjust the horizontal spacing
y.intersp = 2, # Adjust the vertical spacing
inset = c(0.05, 0.05)) # Adjust the position slightly
# }