We are curious about how the quantity of posts submitted to different subreddit forum groups changes over time — the trends, variances and comparisons. After plotting basic time series, we are surprised to notice a continuously significant increase in post submissions to eating disorders forums and dieting forums since 2016, even during the time period in 2019 when other forums received much less submissions in general.
Meanwhile, a potential lag effect between eating disorders post counts and dieting post counts is worth noticing. Further temporal correlation exploration and causality tests are on the way!
By counting the number posts each day, week and month, we will not dive into contents of a post this time but summarize them into 3 time series. 1000 top posts are collected from each of the following subreddit forums — 7 popular eating disorders forums, 9 popular dieting forums, and 15 control group forums including topics like news, art, politics and general human well-being.
For the names of upcoming variables, ‘all’ stands for the sum of all the post count per time unit, ‘ed’ stands for the post count in eating disorders forums per time unit, ‘diet’ stands for the post count in dieting forums per time unit, ‘ctrl’ is short for the post counts in control group forums.
The ‘_scaled’ appended to any variable name, if appearing, is a suffix meaning this numeric variable is subtracted by its mean value and divided by its standard deviation.
Basic Time Series Plots
First, we can plot the real/absolute value of post counts in days, weeks, month and quarters to get a general idea of these four time series.
From these plots, the starting date for the top posts is 2011-01-24 if they are requested from reddit on 2020-09-10, which might imply the reddit website have restrictions on how long ago of posts we can request using their API.
The y-axis values of daily/weekly/monthly post counts are consistent with the summary characteristics of our original dataset, however, we want to be more careful when explaining the trend and variance over time in those time series.
The overall upward trend in all 3 time series might have suggested that we generally observe more posts being submitted to subreddit forums in eating disorders, dieting and control groups, but also higher chances are that, subreddit.com is using algorithms to limit which posts, and how many posts during different time periods are permitted to users’ requests though their API. With this kept in mind, the longterm change in post counts could due to three types of factors: 1. time; 2. reddit.com’s systematic or algorithmic restrictions on data collection; 3. other hidden factors.
Comparable Time Series Plots
To compare the short-term change or variance between different time series, the absolute values are useless and invalid since we collect different amount of data across groups. It’s a bad idea to sampling from the collected data or eliminate forums in order to make the amount of observations in each group even, because we might lose information and adding bias to original data analysis.
One simple way to solve this problem is to scale each time series, by subtracting their mean and dividing by standard deviation. Now, the y-axis represents how many units of standard deviation the time series random variable (the post count at a time point) is away from its average.
The green line is the time series of control group, which contains posts about topics unrelated to eating disorders or dieting. It witnessed a generally continuing upwards since August 2016 until February 2019, then a sharp downwards had last for 3-4 months before it went up again. We might assume this trend reflects the general weekly usage of subreddit forums with regards to the top post submissions.
Interestingly, during the period of 2019 when other forums observed less submissions, the number of post submission to eating disorders forums and dieting forums are still increasing without ‘hesitating’.
Meanwhile, if we take a real-deep look at the monthly patterns of post submissions to eating disorders forums and dieting forums(the red and black), they look somehow similar with a few months’ lag in time, which might be indicating a temporal correlation between them.
Those findings are thrilling, revealing social media usage in terms of post submission to eating disorders forums and dieting forums. We will continue exploring the potentially causal correlation between ‘eating disorders post count and ‘dieting post counts’ over time by controlling the nature effects from time and other systematic factors like sampling errors and algorithmic restrictions during data collection.
More and more exciting results are on their way!
Stay tuned and take care!