Fractional Differencing Implementation (FD Part 3)
Well...That took a lot longer than I expected it too. 6 weeks later and I finally have the last installation in these series of posts. It's also the longest one so you could say it was worth the wait. I recently found out that Python 2.7 (the python I've used for EVERY project) will soon be deprecated. In other words, any support or bug-fixes will cease to exist. In an effort not to repurpose all my Python 2.7 code into Python 3.x code at the same time, I thought I would get a head start. Dealing with sudo permission issues, homebrew and having multiple versions of python existing on my mac are only a few reasons why there was such a substantial delay.
I've also been learning a new platform called MetaTrader5 for backtesting and live trading purposes. Although it's not as glamorous as TradeStation, NinjaTrader or AmiBroker that is seen most often with retail traders, it does have a fair bit of flexibility. Having integration with use cases outside of the program itself could be really interesting to explore going forward. Alright, enough with the excuses, let's get started.
Coding up our Weighting Function and Fractional Differencing
Just to recap: This series of posts talks about fractional differencing and aims to explain, derive and test it in practice. The last post walked through deriving the weighting function so, in this post, we're going to take a look at how we can code this up and go through an example, starting with the code for the weighting function itself.
So, essentially, what I've done is take the derivation from the last post and coded it up as a function. This function takes 3 parameters, the differencing factor (d), the length of our time-series (length) and finally a given threshold (threshold) for which to compute weights for. The reason for including a threshold here is because, as we saw in the last post, a fractional difference is actually an infinite series. Because of apparent data, space and time limitations (duh) we can only have so much computational and memory overhead. So, what can we do?
Well, the method Marcos Lopez de Prado implements is something called Fixed-Window fractional differencing. Simply, we drop the weights when their values drop below a certain threshold. The actual fractional-differencing occurs in a different function:
So, this code definitely seems more confusing but let me try to explain what it's doing just to make it a little more clear.
1. First, we find the length of weights above our threshold value (I set it to 0.00005 in the code block above). This is essentially the number of time-series values we're going to need to derive each fractionally differenced price. It also happens to be the first index at which we can actually start fractionally differencing a time-series.
2. Fill through any NA values (just-in-case our data had any holes in it) and create a data-frame to store the values we will be calculating
3. Find the first and last indices to be used in our computation, match the weights with their respective time-series and combine the values. We showed the derivation for this in the past post:
Where each B represents our time-series backshift operator and the coefficients in front of them determining what to multiply each value by before adding/subtracting them all together.
Test Run/Sanity Check
As a sanity check, I decide to go through an example, staying with the Tesla theme. After doing a test-run with a differencing factor of 0.5 and a threshold value of 0.00005, we get the following chart:
This chart shows the Tesla prices (blue, axis-right) and the fractionally differenced prices (red, axis-left). Just visually, we can see much more stationary behavior but to make sure, let's use a few tests of stationarity to confirm our hypothesis.
Dickey-Fuller, Kwiatowski-Phillips-Schmidt-Shin and the Path to Stationarity
The arrows show the direction of flow and are connected by streamlines. The water bottle you drop will tend to follow the streamline in which it falls. BUT (and this is key here), it doesn't exactly. There are random variations along the path since water flow changes over time. This could be due to strong wind, a fish, a large rock in the river or any of the other complex dynamics of flow at play.
You can see that regardless of where we end up dropping our water bottle (assuming we did so on the left side of the diagram), it ends up flowing towards the right, although not linearly.
The stream inevitably carries the water bottle to the top right, it *trends* over time. Regardless of the random gyrations of the water, the flow of the river accelerates towards a particular point in the top right. (We're almost home clear here!)
Associated with the speed of acceleration, is what we call a "root". When the size of this root is greater than unity (or mathematically, 1), the series cannot be stationary. In our analogy, a root smaller than 1 would mean the acceleration of the river is not strong enough in any direction to make the river flow only in 1 direction. It would have banks, eddys, etc..
ADF & KPSS Tests for Stationarity
Phew! That wasn't so bad, was it? Don't fret over why exactly a unit root means non-stationarity (although if you're interested, here is a link) since it isn't really the purpose of this post. Just know that there are many forms of non-stationarity and a unit root is just one of them (a REALLY important and common one at that).
Now back to the math. I take a look at some classical stationarity tests to see how our fractionally differenced series holds up.
1.) ADF Test: This test is used to detect the presence of a unit-root.
2.) KPSS Test: This test is used to detect the presence of trend-stationarity around a deterministic trend.
These two tests are meant to complement each other. By testing for both unit-roots and trend-stationarity, it gives us a better idea of where we can categorize the stochastic behavior of our data. I run both these tests below with the following results:
Varying Starting Conditions: How do test results fare?
So, what's next? Well, I am a bit curious to see how our test results change as our differencing parameter changes. So, below, let's look at how changing our threshold value (for fractional differencing coefficients) and differencing factors change the results of the adf test.
I also decide to take a look at how these parameters change the number of data points we are able to difference (or the length of the differenced series).
Running the entire script above for the following pairwise combinations of thresholds and differencing factors lead to the following:
Differencing Factors: 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2
Threshold Values: 1e-3, 9e-4, 7e-4, 5e-4, 3e-4, 1e-4, 9e-5, 7e-5, 5e-5, 3e-5
I've also been learning a new platform called MetaTrader5 for backtesting and live trading purposes. Although it's not as glamorous as TradeStation, NinjaTrader or AmiBroker that is seen most often with retail traders, it does have a fair bit of flexibility. Having integration with use cases outside of the program itself could be really interesting to explore going forward. Alright, enough with the excuses, let's get started.
Coding up our Weighting Function and Fractional Differencing
Just to recap: This series of posts talks about fractional differencing and aims to explain, derive and test it in practice. The last post walked through deriving the weighting function so, in this post, we're going to take a look at how we can code this up and go through an example, starting with the code for the weighting function itself.
So, essentially, what I've done is take the derivation from the last post and coded it up as a function. This function takes 3 parameters, the differencing factor (d), the length of our time-series (length) and finally a given threshold (threshold) for which to compute weights for. The reason for including a threshold here is because, as we saw in the last post, a fractional difference is actually an infinite series. Because of apparent data, space and time limitations (duh) we can only have so much computational and memory overhead. So, what can we do?
Well, the method Marcos Lopez de Prado implements is something called Fixed-Window fractional differencing. Simply, we drop the weights when their values drop below a certain threshold. The actual fractional-differencing occurs in a different function:
1. First, we find the length of weights above our threshold value (I set it to 0.00005 in the code block above). This is essentially the number of time-series values we're going to need to derive each fractionally differenced price. It also happens to be the first index at which we can actually start fractionally differencing a time-series.
2. Fill through any NA values (just-in-case our data had any holes in it) and create a data-frame to store the values we will be calculating
3. Find the first and last indices to be used in our computation, match the weights with their respective time-series and combine the values. We showed the derivation for this in the past post:
Test Run/Sanity Check
As a sanity check, I decide to go through an example, staying with the Tesla theme. After doing a test-run with a differencing factor of 0.5 and a threshold value of 0.00005, we get the following chart:
This chart shows the Tesla prices (blue, axis-right) and the fractionally differenced prices (red, axis-left). Just visually, we can see much more stationary behavior but to make sure, let's use a few tests of stationarity to confirm our hypothesis.
Dickey-Fuller, Kwiatowski-Phillips-Schmidt-Shin and the Path to Stationarity
Before I get into who all these guys are and why they all decided to name their discoveries after themselves (whoever said mathematicians weren't egotistical?), let's take a deeper look at stationarity.
Reminding ourselves what this means exactly:
'A stationary time series is one where the mean (average value) and variance (spread of values) is constant over time.'
Although this explanation gives us a rough idea of what stationarity entails, it fails to dive into the details that are keys to understanding how we go about testing it. Thus, I provide an explanation from a different angle, what stationarity Is Not:
'A non-stationary process (or unit-root process) is a process that contains a stochastic trend over time (a systematic pattern over time). There is no long-run mean or variance.'
What the hell is a unit-root you might ask? I spent some time coming up with an intuitive explanation (with the help of good ol' google) that I think anyone can understand.
Unit Root: An Analogy
So let's say you go camping. While wandering around the forest looking for a good campsite, you find a bridge and a river running below it. You decide to cross it and not looking exactly where you're going, you accidentally trip over a wooden plank. The water bottle you were holding jerks out of your hand and into the river.
You see it floating slowly under the bridge and decide to take a closer look. You notice that it fell on the left side of the bridge and got taken to the right. You wonder if it would do it again?
Let's say the flow along the surface of the water looked like the following:
You can see that regardless of where we end up dropping our water bottle (assuming we did so on the left side of the diagram), it ends up flowing towards the right, although not linearly.
The stream inevitably carries the water bottle to the top right, it *trends* over time. Regardless of the random gyrations of the water, the flow of the river accelerates towards a particular point in the top right. (We're almost home clear here!)
Associated with the speed of acceleration, is what we call a "root". When the size of this root is greater than unity (or mathematically, 1), the series cannot be stationary. In our analogy, a root smaller than 1 would mean the acceleration of the river is not strong enough in any direction to make the river flow only in 1 direction. It would have banks, eddys, etc..
ADF & KPSS Tests for Stationarity
Phew! That wasn't so bad, was it? Don't fret over why exactly a unit root means non-stationarity (although if you're interested, here is a link) since it isn't really the purpose of this post. Just know that there are many forms of non-stationarity and a unit root is just one of them (a REALLY important and common one at that).
Now back to the math. I take a look at some classical stationarity tests to see how our fractionally differenced series holds up.
1.) ADF Test: This test is used to detect the presence of a unit-root.
2.) KPSS Test: This test is used to detect the presence of trend-stationarity around a deterministic trend.
These two tests are meant to complement each other. By testing for both unit-roots and trend-stationarity, it gives us a better idea of where we can categorize the stochastic behavior of our data. I run both these tests below with the following results:
What is of most concern to us is our p-value. In this test, our p-value is the following:
Assuming that our series has a unit root, (or is non-stationary) we would obtain these results in 0.0000155% of our tests through random-sampling and re-testing. Essentially, there is a very low likelihood that our series is in fact non-stationary
So this is definitely a good thing since such a low p-value is basically telling us there is no unit root (and hence no non-stationary trend in our data).
BUT, is that enough? Not quite... We still would like to see if our earlier hypothesis of trend stationarity is consistent, so we turn to the KPSS test.
In this case, we can interpret our p-value as the following:
Assuming our series is trend-stationary, we would obtain these results in at least 10% of our results through random-sampling and re-testing. (The at-least 10% is due to the fact that the test cannot give us p-values higher than 0.10 or 10%).
Although at least 10% doesn't sound like a whole lot, it definitely is in statistics. This test also further shows that our series is in-fact very likely to be trend-stationary.
So, what's next? Well, I am a bit curious to see how our test results change as our differencing parameter changes. So, below, let's look at how changing our threshold value (for fractional differencing coefficients) and differencing factors change the results of the adf test.
I also decide to take a look at how these parameters change the number of data points we are able to difference (or the length of the differenced series).
Running the entire script above for the following pairwise combinations of thresholds and differencing factors lead to the following:
Differencing Factors: 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2
Threshold Values: 1e-3, 9e-4, 7e-4, 5e-4, 3e-4, 1e-4, 9e-5, 7e-5, 5e-5, 3e-5
We see that as our differencing factor increases, the number of observations to be used also increases since a higher differencing factor means slower convergence of the weights. Also note that as we lower our threshold, the number of coefficients we use also goes down. This follows from lower thresholds meaning more weights to be included per calculation of fractionally-differenced prices and thus fewer available series to fractionally difference.
Looking at our heatmap next, we see that higher amounts of differencing do typically lead to much more negative test-statistics (which directly translate to very low p-values). All in all, this basically tells us that as we increase our differencing factor our series becomes more stationary. This would make sense since we typically take first differences (d=1) to get returns which are considered stationary. Also increasing our threshold makes our test statistics slightly better since we get more data points to fractionally difference at the cost of being less accurate.
What this chart highlights is the fact that we don't actually have to overly difference a series to get trend-stationarity in the underlying series.
Pitfalls, Limitations, Conclusions
So, why do we need to be careful about what we've been studying here? Well, for the ADF test in particular, testing for a unit-root and testing for trend-stationarity aren't exactly the same thing. The weird thing is, it is possible for a time series to be non-stationary but have no unit root.
Trend-stationary tests (like the KPSS test) are really looking for time-series convergence around a certain mean, and this mean can grow or shrink over time. Unit root processes however, assume that shocks to our series have a permanent impact.
Also, in the presence of time-varying variance (which is the case in some of the computed examples above), our results become less robust (although this can be easily fixed by conducting other unit root tests such as the Phillips-Peron test that is more robust to heteroskedasticity).
Final Thoughts
So, what exactly has this series of posts shown? Virtually all finance papers apply integer differencing (a factor of 1) and is usually overkill for most financial time-series. This over-differencing causes most if not all memory to be lost, leaving us very little with regard to information in using prices in statistical models.
I'll conclude with a thought that radically shifted my perspective on quantitative trading:
It's important to think about quantitative trading from a statistical framework. The price path you see in a chart is simply a realized price path, generated from an underlying distribution (that we, unfortunately, don't know much about). Going forward, under repeated sampling, we have to assume that other price paths are also very likely.
Thinking about equity curves and time-series paths in this manner definitely make things more confusing but solidify why statistical ideas, although complex on the surface, are so useful and powerful in giving quants an edge.
I am getting the following error when I try to run the fracDiff function:
ReplyDeleteAttributeError: 'Series' object has no attribute 'columns'
I was having some trouble with this at first. Take a look at the jupyter notebook I posted for the blog on github at:
Deletehttps://github.com/Dhiraj96/blog_posts/blob/master/post_1%2C2%2C3/Post1_Fractional_Differencing.ipynb
The dataframe that needs to be passed needs to have the following column format: [Date, Price] if you look at In[3] to In[32] you can see the format I used (or just copy paste the code in the notebook file).
Hope this helps!
Thanks for bringing Marcos Lopez de Prado closer to layman's knowledge. I've found your blog while trying to digest his Advances in Financial Machine Learning. I have a question though:
ReplyDeleteConsidering a ML regression model to predict a univariate time series, say, Tesla daily prices,
should I apply Fractional Differencing to both my features X (i.e. technical indicators, volume, etc) and my outcome Y (one day ahead daily returns)?
Should I normalize both features and outcome?
Before or after Fractional Differencing?
Sorry for taking so long to reply! I have been quite busy working for an algorithmic trading company for the past year so I've had almost zero time to work on my own personal projects.
DeleteTo address your question however, it really depends on the use case. Since we are trying to preserve memory, it would be more apt to utilize this technique on a time series as a predictive feature (before you run the analysis ofcourse). I'm not sure using it on the labels (i.e. what you're trying to predict) is the correct way to go about it BUT I may be wrong on that and it is a good question. I think it makes the most sense as a predictor (to be used on your x feature) IF we have the same problems described in Part 1 related to memory loss.
In the case of technical indicators or volume (which are time-series but may also be stationary without any need for first differencing) then I would avoid this method since the main goal here is to preserve memory but also stationarity. I think it would be best used on a predictor/feature which needed a certain level of differencing to be stationary less than 1.
Hope that helps!
Thanks for the reply, but my problem now is reconstruct the price after applying fractional differencing. I am having a hard time trying to un-difference the series back to it's original. Any ideas?
DeleteI'M NOW FULFILL FINANCIALLY BECAUSE OF THE LOAN I GOT FROM LFDS .I would like to bring this to the notice of the public about how i came in contact with LFDS after i lost my job and being denied loan by my bank and other financial institution due to my credit score. I could not pay my children's fees. I was behind on bills, about to be thrown out of the house due to my inability to pay my rent, It was during this period my kids were taken from me by foster care. Then i set out to seek for funds online where i lost $3,670 that i borrowed from friends which i was rip off by two online loan companies. Until i read about:Le_Meridian Funding Service (lfdsloans@outlook.com / lfdsloans@lemeridianfds.com) somewhere on the internet, Still wasn't convince because of what i have been through until a relative of mine who is a clergy also told me about the ongoing loan scheme of LFDS at a very low interest rate of 1.9%% and lovely repayment terms without penalty for default of payment. I have no choice than to also contact them which i did through text +1-989-394-3740 and Mr Benjamin responded back to me That day was the I'M best and greatest day of my life which can never be forgotten when i receive a credit alert of $400,000.00 Usd loan amount i applied for. I utilized the loan effectively to pay up my debts and to start up a business and today i and my kids are so happy and fulfill. You can as well contact them through email: (lfdsloans@outlook.com / lfdsloans@lemeridianfds.com) WhatsApptext helpline: +1-989-394-3740 Why am i doing this? I am doing this to save as many that are in need of a loan not to be victim of scams on the internet. Thanks and God bless you all, I'm Oleksander Artem from Horizon Park BC , Ukrain.
ReplyDeleteWynn Casino - Mapyro
ReplyDeleteWynn 춘천 출장샵 Hotel & Casino is a 서울특별 출장마사지 Wedding Venue in Las Vegas, NV. Read 정읍 출장마사지 reviews, view photos, see special offers, 대구광역 출장샵 and contact Wynn 공주 출장샵 Las Vegas directly at