### Fractional Differencing Implementation (FD Part 3)

Well...That took a lot longer than I expected it too. 6 weeks later and I finally have the last installation in these series of posts. It's also the longest one so you could say it was worth the wait. I recently found out that Python 2.7 (the python I've used for EVERY project) will soon be deprecated. In other words, any support or bug-fixes will cease to exist. In an effort not to repurpose all my Python 2.7 code into Python 3.x code at the same time, I thought I would get a head start. Dealing with sudo permission issues, homebrew and having multiple versions of python existing on my mac are only a few reasons why there was such a substantial delay.

I've also been learning a new platform called MetaTrader5 for backtesting and live trading purposes. Although it's not as glamorous as TradeStation, NinjaTrader or AmiBroker that is seen most often with retail traders, it does have a fair bit of flexibility. Having integration with use cases outside of the program itself could be really interesting to explore going forward. Alright, enough with the excuses, let's get started.

Just to recap: This series of posts talks about fractional differencing and aims to explain, derive and test it in practice. The last post walked through deriving the weighting function so, in this post, we're going to take a look at how we can code this up and go through an example, starting with the code for the weighting function itself.

So, essentially, what I've done is take the derivation from the last post and coded it up as a function. This function takes 3 parameters, the differencing factor (d), the length of our time-series (length) and finally a given threshold (threshold) for which to compute weights for. The reason for including a threshold here is because, as we saw in the last post, a fractional difference is actually an infinite series. Because of apparent data, space and time limitations (duh) we can only have so much computational and memory overhead. So, what can we do?

Well, the method Marcos Lopez de Prado implements is something called Fixed-Window fractional differencing. Simply, we drop the weights when their values drop below a certain

So, this code definitely seems more confusing but let me try to explain what it's doing just to make it a little more clear.

1. First, we find the length of weights above our threshold value (I set it to 0.00005 in the code block above). This is essentially the number of time-series values we're going to need to derive each fractionally differenced price. It also happens to be the first index at which we can actually start fractionally differencing a time-series.

2. Fill through any NA values (just-in-case our data had any holes in it) and create a data-frame to store the values we will be calculating

3. Find the first and last indices to be used in our computation, match the weights with their respective time-series and combine the values. We showed the derivation for this in the past post:

Where each B represents our time-series backshift operator and the coefficients in front of them determining what to multiply each value by before adding/subtracting them all together.

As a sanity check, I decide to go through an example, staying with the Tesla theme. After doing a test-run with a differencing factor of 0.5 and a threshold value of 0.00005, we get the following chart:

This chart shows the Tesla prices (blue, axis-right) and the fractionally differenced prices (red, axis-left). Just visually, we can see much more stationary behavior but to make sure, let's use a few tests of stationarity to confirm our hypothesis.

The arrows show the direction of flow and are connected by streamlines. The water bottle you drop will tend to follow the streamline in which it falls. BUT (and this is key here), it doesn't exactly. There are random variations along the path since water flow changes over time. This could be due to strong wind, a fish, a large rock in the river or any of the other complex dynamics of flow at play.

You can see that regardless of where we end up dropping our water bottle (assuming we did so on the left side of the diagram), it ends up flowing towards the right, although not linearly.

The stream inevitably carries the water bottle to the top right, it *trends* over time. Regardless of the random gyrations of the water, the flow of the river accelerates towards a particular point in the top right. (We're almost home clear here!)

Associated with the speed of acceleration, is what we call a "root". When the size of this root is greater than unity (or mathematically, 1), the series cannot be stationary. In our analogy, a root smaller than 1 would mean the acceleration of the river is not strong enough in any direction to make the river flow only in 1 direction. It would have banks, eddys, etc..

Now back to the math. I take a look at some classical stationarity tests to see how our fractionally differenced series holds up.

1.) ADF Test: This test is used to detect the presence of a unit-root.

2.) KPSS Test: This test is used to detect the presence of trend-stationarity around a deterministic trend.

These two tests are meant to complement each other. By testing for both unit-roots and trend-stationarity, it gives us a better idea of where we can categorize the stochastic behavior of our data. I run both these tests below with the following results:

I also decide to take a look at how these parameters change the number of data points we are able to difference (or the length of the differenced series).

Running the entire script above for the following pairwise combinations of thresholds and differencing factors lead to the following:

Differencing Factors: 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2

Threshold Values: 1e-3, 9e-4, 7e-4, 5e-4, 3e-4, 1e-4, 9e-5, 7e-5, 5e-5, 3e-5

I've also been learning a new platform called MetaTrader5 for backtesting and live trading purposes. Although it's not as glamorous as TradeStation, NinjaTrader or AmiBroker that is seen most often with retail traders, it does have a fair bit of flexibility. Having integration with use cases outside of the program itself could be really interesting to explore going forward. Alright, enough with the excuses, let's get started.

__Coding up our Weighting Function and Fractional Differencing__Just to recap: This series of posts talks about fractional differencing and aims to explain, derive and test it in practice. The last post walked through deriving the weighting function so, in this post, we're going to take a look at how we can code this up and go through an example, starting with the code for the weighting function itself.

So, essentially, what I've done is take the derivation from the last post and coded it up as a function. This function takes 3 parameters, the differencing factor (d), the length of our time-series (length) and finally a given threshold (threshold) for which to compute weights for. The reason for including a threshold here is because, as we saw in the last post, a fractional difference is actually an infinite series. Because of apparent data, space and time limitations (duh) we can only have so much computational and memory overhead. So, what can we do?

Well, the method Marcos Lopez de Prado implements is something called Fixed-Window fractional differencing. Simply, we drop the weights when their values drop below a certain

*threshold*. The actual fractional-differencing occurs in a different function:1. First, we find the length of weights above our threshold value (I set it to 0.00005 in the code block above). This is essentially the number of time-series values we're going to need to derive each fractionally differenced price. It also happens to be the first index at which we can actually start fractionally differencing a time-series.

2. Fill through any NA values (just-in-case our data had any holes in it) and create a data-frame to store the values we will be calculating

3. Find the first and last indices to be used in our computation, match the weights with their respective time-series and combine the values. We showed the derivation for this in the past post:

__Test Run/Sanity Check__As a sanity check, I decide to go through an example, staying with the Tesla theme. After doing a test-run with a differencing factor of 0.5 and a threshold value of 0.00005, we get the following chart:

This chart shows the Tesla prices (blue, axis-right) and the fractionally differenced prices (red, axis-left). Just visually, we can see much more stationary behavior but to make sure, let's use a few tests of stationarity to confirm our hypothesis.

__Dickey-Fuller, Kwiatowski-Phillips-Schmidt-Shin and the Path to Stationarity__
Before I get into who all these guys are and why they all decided to name their discoveries after themselves (whoever said mathematicians weren't egotistical?), let's take a deeper look at stationarity.

Reminding ourselves what this means exactly:

*'A stationary time series is one where the mean (average value) and variance (spread of values) is constant over time.'*

Although this explanation gives us a rough idea of what stationarity entails, it fails to dive into the details that are keys to understanding how we go about testing it. Thus, I provide an explanation from a different angle, what stationarity

**Is Not**:*'A non-stationary process (or unit-root process) is a process that contains a stochastic trend over time (a systematic pattern over time). There is no long-run mean or variance.'*

What the hell is a unit-root you might ask? I spent some time coming up with an intuitive explanation (with the help of good ol' google) that I think anyone can understand.

**Unit Root: An Analogy**

So let's say you go camping. While wandering around the forest looking for a good campsite, you find a bridge and a river running below it. You decide to cross it and not looking exactly where you're going, you accidentally trip over a wooden plank. The water bottle you were holding jerks out of your hand and into the river.

You see it floating slowly under the bridge and decide to take a closer look. You notice that it fell on the left side of the bridge and got taken to the right. You wonder if it would do it again?

Let's say the flow along the surface of the water looked like the following:

You can see that regardless of where we end up dropping our water bottle (assuming we did so on the left side of the diagram), it ends up flowing towards the right, although not linearly.

The stream inevitably carries the water bottle to the top right, it *trends* over time. Regardless of the random gyrations of the water, the flow of the river accelerates towards a particular point in the top right. (We're almost home clear here!)

Associated with the speed of acceleration, is what we call a "root". When the size of this root is greater than unity (or mathematically, 1), the series cannot be stationary. In our analogy, a root smaller than 1 would mean the acceleration of the river is not strong enough in any direction to make the river flow only in 1 direction. It would have banks, eddys, etc..

**ADF & KPSS Tests for Stationarity****Phew! That wasn't so bad, was it? Don't fret over why exactly a unit root means non-stationarity (although if you're interested, here is a link) since it isn't really the purpose of this post. Just know that there are many forms of non-stationarity and a unit root is just one of them (a REALLY important and common one at that).**

Now back to the math. I take a look at some classical stationarity tests to see how our fractionally differenced series holds up.

1.) ADF Test: This test is used to detect the presence of a unit-root.

2.) KPSS Test: This test is used to detect the presence of trend-stationarity around a deterministic trend.

These two tests are meant to complement each other. By testing for both unit-roots and trend-stationarity, it gives us a better idea of where we can categorize the stochastic behavior of our data. I run both these tests below with the following results:

What is of most concern to us is our p-value. In this test, our p-value is the following:

*Assuming that our series has a unit root, (or is non-stationary) we would obtain these results in 0.0000155% of our tests through random-sampling and re-testing. Essentially, there is a very low likelihood that our series is in fact non-stationary*

So this is definitely a good thing since such a low p-value is basically telling us there is no unit root (and hence no non-stationary trend in our data).

BUT, is that enough? Not quite... We still would like to see if our earlier hypothesis of trend stationarity is consistent, so we turn to the KPSS test.

In this case, we can interpret our p-value as the following:

*Assuming our series is trend-stationary, we would obtain these results in at least 10% of our results through random-sampling and re-testing. (The at-least 10% is due to the fact that the test cannot give us p-values higher than 0.10 or 10%).*

Although at least 10% doesn't sound like a whole lot, it definitely is in statistics. This test also further shows that our series is in-fact very likely to be trend-stationary.

__Varying Starting Conditions: How do test results fare?__**So, what's next? Well, I am a bit curious to see how our test results change as our differencing parameter changes. So, below, let's look at how changing our threshold value (for fractional differencing coefficients) and differencing factors change the results of the adf test.**

I also decide to take a look at how these parameters change the number of data points we are able to difference (or the length of the differenced series).

Running the entire script above for the following pairwise combinations of thresholds and differencing factors lead to the following:

Differencing Factors: 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2

Threshold Values: 1e-3, 9e-4, 7e-4, 5e-4, 3e-4, 1e-4, 9e-5, 7e-5, 5e-5, 3e-5

We see that as our differencing factor increases, the number of observations to be used also increases since a higher differencing factor means slower convergence of the weights. Also note that as we lower our threshold, the number of coefficients we use also goes down. This follows from lower thresholds meaning more weights to be included per calculation of fractionally-differenced prices and thus fewer available series to fractionally difference.

Looking at our heatmap next, we see that higher amounts of differencing do typically lead to much more negative test-statistics (which directly translate to very low p-values). All in all, this basically tells us that as we increase our differencing factor our series becomes more stationary. This would make sense since we typically take first differences (d=1) to get returns which are considered stationary. Also increasing our threshold makes our test statistics slightly better since we get more data points to fractionally difference at the cost of being less accurate.

What this chart highlights is the fact that we don't actually have to overly difference a series to get trend-stationarity in the underlying series.

__Pitfalls, Limitations, Conclusions__
So, why do we need to be careful about what we've been studying here? Well, for the ADF test in particular, testing for a unit-root and testing for trend-stationarity aren't exactly the same thing. The weird thing is, it is possible for a time series to be non-stationary but have no unit root.

Trend-stationary tests (like the KPSS test) are really looking for time-series convergence around a certain mean, and this mean can grow or shrink over time. Unit root processes however, assume that shocks to our series have a permanent impact.

Also, in the presence of time-varying variance (which is the case in some of the computed examples above), our results become less robust (although this can be easily fixed by conducting other unit root tests such as the Phillips-Peron test that is more robust to heteroskedasticity).

**Final Thoughts**

So, what exactly has this series of posts shown? Virtually all finance papers apply integer differencing (a factor of 1) and is usually overkill for most financial time-series. This over-differencing causes most if not all memory to be lost, leaving us very little with regard to information in using prices in statistical models.

I'll conclude with a thought that radically shifted my perspective on quantitative trading:

It's important to think about quantitative trading from a statistical framework. The price path you see in a chart is simply a realized price path, generated from an underlying distribution (that we, unfortunately, don't know much about). Going forward, under repeated sampling, we have to assume that other price paths are also very likely.

Thinking about equity curves and time-series paths in this manner definitely make things more confusing but solidify why statistical ideas, although complex on the surface, are so useful and powerful in giving quants an edge.

I am getting the following error when I try to run the fracDiff function:

ReplyDeleteAttributeError: 'Series' object has no attribute 'columns'

I was having some trouble with this at first. Take a look at the jupyter notebook I posted for the blog on github at:

Deletehttps://github.com/Dhiraj96/blog_posts/blob/master/post_1%2C2%2C3/Post1_Fractional_Differencing.ipynb

The dataframe that needs to be passed needs to have the following column format: [Date, Price] if you look at In[3] to In[32] you can see the format I used (or just copy paste the code in the notebook file).

Hope this helps!