Estimates the Volume-Synchronized Probability of Informed Trading as developed in Easley et al. (2011) and Easley et al. (2012) .
Usage
vpin(data, timebarsize = 60, buckets = 50, samplength = 50,
tradinghours = 24, verbose = TRUE)
Arguments
- data
A dataframe with 3 variables:
{timestamp, price, volume}
.- timebarsize
An integer referring to the size of timebars in seconds. The default value is
60
.- buckets
An integer referring to the number of buckets in a daily average volume. The default value is
50
.- samplength
An integer referring to the sample length or the window size used to calculate the
VPIN
vector. The default value is50
.- tradinghours
An integer referring to the length of daily trading sessions in hours. The default value is
24
.- verbose
A binary variable that determines whether detailed information about the steps of the estimation of the VPIN model is displayed. No output is produced when
verbose
is set toFALSE
. The default value isTRUE
.
Details
The dataframe data should contain at least three variables. Only the
first three variables will be considered and in the following order
{timestamp, price, volume}
.
The property @bucketdata
is created as in
Abad and Yague (2012)
.
The argument timebarsize
is in seconds enabling the user to implement
shorter than 1
minute intervals. The default value is set to 1
minute
(60
seconds) following Easley et al. (2011, 2012).
The parameter tradinghours
is used to eventually correct the duration per
bucket. The duration of a given bucket is the difference between the
timestamp of the last trade endtime
and the timestamp of the first trade
stime
in the bucket. If the first trade and the last trade in a
bucket occur in two different days, and the market trading session does not
cover a full day (24 hours)
; then the duration of the bucket will be
inflated. Assume that the daily trading session is 8 hours
(tradinghours=8)
, the start time of a bucket is 2018-10-12 17:06:40
and its end time is 2018-10-13 09:36:00
. A straightforward calculation
gives that the duration of this bucket is 59,360 secs
. However, this
duration includes the time during which the market is closed (16 hours)
.
The corrected duration takes into consideration only the time of market
activity: duration=59,360-16*3600= 1760 secs
, i.e., about 30 minutes
.
References
Abad D, Yague J (2012).
“From PIN to VPIN: An introduction to order flow toxicity.”
The Spanish Review of Financial Economics, 10(2), 74--83.
Easley D, De Prado MML, Ohara M (2011).
“The microstructure of the \"flash crash\": flow toxicity, liquidity crashes, and the probability of informed trading.”
The Journal of Portfolio Management, 37(2), 118--128.
Easley D, Lopez De Prado MM, OHara M (2012).
“Flow toxicity and liquidity in a high-frequency world.”
Review of Financial Studies, 25(5), 1457--1493.
ISSN 08939454.
Examples
# There is a preloaded dataset called 'hfdata' contained in the package.
# It is an artificially created high-frequency trading data. The dataset
# contains 100 000 trades and five variables 'timestamp', 'price',
# 'volume', 'bid' and 'ask'. For more information, type ?hfdata.
xdata <- hfdata
# Estimate VPIN model, using the following parameter set:
# timebarsize = 60, buckets = 50 and samplength = 250.
estimate <- vpin(xdata, timebarsize = 60, buckets = 50, samplength = 250)
#> [+] VPIN Estimation started.
#> |-[1] Checking and preparing the data...
#> |-[2] Creating 60-second timebars...
#> |-[3] Calculating Volume Bucket Size (VBS) and Sigma(DP)...
#> |-[4] Breaking up large 60-second timebars' volume...
#> |-[5] Assigning 60-second timebars into buckets...
#> |-[6] Balancing timebars and adjusting bucket sizes to VBS...
#> |-[7] Calculating aggregate bucket data...
#> |-[8] Calculating VPIN vector...
#> [+] VPIN estimation completed
# Display a description of the estimate
show(estimate)
#> ----------------------------------
#> VPIN estimation completed successfully
#> ----------------------------------
#> Type object@vpin to access the VPIN vector.
#> Type object@bucketdata to access data used to construct the VPIN vector.
#> Type object@dailyvpin to access the daily VPIN vectors.
#>
#> VPIN model
#>
#> Table:
[+] VPIN descriptive statistics
#>
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> ------ --------- -------- ------ --------- ------ ------
#> 0.15 0.19 0.21 0.21 0.22 0.3 249
#>
#>
#> Table:
[+] VPIN parameters
#>
#> tbSize buckets samplength VBS #days
#> -------- --------- ------------ ---------- -------
#> 60 50 250 3889.833 72
#>
#> -------
#> Running time: 7.468 seconds
# Plot the estimated VPIN vector
plot(estimate@vpin, type = "l", xlab = "time", ylab = "VPIN", col = "blue")
# Display the parameters of VPIN estimates
show(estimate@parameters)
#> tbSize buckets samplength VBS #days
#> 60.000 50.000 250.000 3889.833 72.000
# Store the computed data of the different buckets in a dataframe 'buckets'.
# Display the first 10 rows of the dataframe 'buckets'.
buckets <- estimate@bucketdata
show(head(buckets, 10))
#> bucket agg.bvol agg.svol aoi starttime endtime
#> 1 1 1469.262 2420.570 951.3081 2018-10-18 00:11:33 2018-10-18 01:04:33
#> 2 2 2064.433 1825.399 239.0338 2018-10-18 01:04:33 2018-10-18 01:54:33
#> 3 3 1715.280 2174.552 459.2722 2018-10-18 01:54:33 2018-10-18 02:52:33
#> 4 4 2858.429 1031.403 1827.0260 2018-10-18 02:52:33 2018-10-18 03:37:33
#> 5 5 2044.683 1845.150 199.5326 2018-10-18 03:37:33 2018-10-18 04:31:33
#> 6 6 2382.084 1507.749 874.3346 2018-10-18 04:31:33 2018-10-18 04:33:33
#> 7 7 2568.934 1320.899 1248.0342 2018-10-18 04:33:33 2018-10-18 04:33:33
#> 8 8 2568.934 1320.899 1248.0342 2018-10-18 04:33:33 2018-10-18 04:33:33
#> 9 9 2568.934 1320.899 1248.0342 2018-10-18 04:33:33 2018-10-18 04:33:33
#> 10 10 2727.081 1162.752 1564.3294 2018-10-18 04:33:33 2018-10-18 04:40:33
#> vpin duration
#> 1 NA 3180
#> 2 NA 3000
#> 3 NA 3480
#> 4 NA 2700
#> 5 NA 3240
#> 6 NA 120
#> 7 NA 0
#> 8 NA 0
#> 9 NA 0
#> 10 NA 420
# Store the daily VPIN values (weighted and unweighted) in a dataframe
# 'dayvpin'.
# Display the first 10 rows of the dataframe 'dayvpin'.
dayvpin <- estimate@dailyvpin
show(head(dayvpin, 10))
#> day dvpin dvpin_weighted
#> 1 2018-10-22 0.1926986 0.2002240
#> 2 2018-10-23 0.1675809 0.1662273
#> 3 2018-10-24 0.1792315 0.1789575
#> 4 2018-10-25 0.1771051 0.1766631
#> 5 2018-10-26 0.1780552 0.1775370
#> 6 2018-10-27 0.1918747 0.1889173
#> 7 2018-10-31 0.1912656 0.1902348
#> 8 2018-11-01 0.1913555 0.1907949
#> 9 2018-11-07 0.1923553 0.1943623
#> 10 2018-11-09 0.1891909 0.1874105