Let’s revisit the two time series (Figure 1) — weekly post submissions to eating disorders forums(‘ed’) and weekly post submissions to dieting forums(‘diet’). Our task is to reveal relationships between these time series and conduct granger causality test.
Before using cross-correlation function to explore relationships between two time series or conducting any temporal causality test, we need our time series to be as stationary as possible. The intuition behind is that we only want to keep the information about the time series themselves, and eliminate the temporal effects from other confounding factors.
If both of the time series are correlated with time, which is highly possible, a similar trend across time will be observed in both of them, but this temporal information has nothing to do with their own nature. To be more specific, let’s imagine during a certain time period, post submissions to subreddit platform boost and broke down the company’s virtual machine, what will be expected to see? A following sharp downward in number of post submission in both time series with no surprise.
Autocorrelation (ACF and PACF)
Thanks to linear regression we did in the previous post https://edetectives.blog/2020/09/29/time-series-2-detrending-by-linear-regression/ , we controlled time effects, systematic errors and data collection related confounding factors. Besides examining time series plots, autocorrelation function (ACF) and partial autocorrelation function (PACF) also offer insights about whether a time series is stationary or not.
In Figure 2, ACF decays gradually for both time series but doesn’t take too long (too many lags) to cut off, in contrast, for a non-stationary time series ACF usually decays very slowly.
The PACF for ‘ed’ time series is significant at lag 1, 2, and 3, which indicates the number of post submitted to eating disorder forums in the current week significantly positively related to itself 1 week, 2 weeks and 3 weeks ago. In other words, if we observe more posts submitted to eating disorders forums, it’s highly likely that we will see more posts submitted to the same group of forums in the upcoming three weeks.
The PACF for ‘diet’ time series is significant at lag 1 and 2, which indicates the number of post submitted to dieting forums in the current week significantly positively related to itself 1 week and 2 weeks ago. The same interpretation holds that if more posts submitted to dieting forums this week, we will expect to see more posts in the following two weeks.
Also, ACF and PACF are very useful in time series forecasting. AR(2) and AR(3) model seems to be a plausible candidate if we want to forecasting the number of post submitted to dieting forums and eating disorders forums simply by itself, respectively. However, forecasting is not our goal in this project and we will not discuss this further.
Cross-correlation analysis is exploratory. Similar to autocorrelation revealing how a time series related to itself at different time lags, cross-correlation discloses how two time series related to each other at different time lags.
In Figure 3, we plot the correlation between ‘diet’ time series and lagged ‘ed’ time series.
The right part in this CCF plot (positive lags) presents how previous number of post submission in dieting forums impacts current number of post submission in eating disorders forums; the left part (negative lags) shows how previous number of post submitted to eating disorders forums affects current number of post submitted to dieting forums.
In particular, previous values (lags) of post submission in dieting forums are positively correlated with current post submission in eating disorders forums, that is, if reddit platform has more posts submitted to its dieting forums in current weeks, we will expect to see more posts submitted to eating disorders forums in imminent weeks, even months.
Meanwhile, previous values (lags) of post submission in eating disorders forums are negatively correlated with current post submission in dieting forums, which means, if reddit platform has more posts submitted to its eating disorders forums in current weeks, we are going to see a decrease in post submission in dieting forums in upcoming weeks, even months.
A reasonable hypothesis for this observation consists of two real-world circumstances:
1. Users who participated more in dieting forums , assuming on diet, have higher risk of developing eating disorders and join the discussions in eating disorders forums later, to self-diagnose, get more information, or enact recovery.
2. Users who have already developed eating disorders and maintain active in eating disorders forums are more likely to act against dieting behaviors such as spending less time in online dieting community (forums).
Granger Causality Test
We will not dig too much into details about the theory behind ‘Granger causality test’. A short note for Figure 4: if the p-value (y-axis) goes lower than 0.05 (the horizontal dash line) at certain lag values (x-axis), we suggest the corresponding test result is significant at these lags. The lag values correspond to the length of week gap between these two time series processes.
The orange line represents the p-value of whether the current number of posts submitted to dieting forums (‘dieting’) significantly effects the future number of posts submitted to eating disorders forums (‘ED’). The blue one show the opposite effect direction.
From lag 2 to 20 and 28 to 110 , the Granger test for dieting -> ED is significant AND the Granger test for ED -> dieting is not, thus, dieting Granger-causes ED within these week lags. We may guess the impact of dieting to eating disorders can last over 2 years (104 weeks) long.
On the other hand, we see a ‘feedback’ loop from week lag 20 to 28, during which the Granger test for ED -> dieting is significant AND the Granger test for dieting -> ED is also significant.
This is an interesting result because, according to clinical resources about eating disorders, ‘relapse’ happens during the recovery process of eating disorders. 20 – 28 weeks is about half a year, people in eating disorders online community might also tend to involve in dieting community, either ‘relapse’ or ‘fight against’ can be possible.