TIME Fun!
Using R on Nowcastic Forecasts with Google Trends
CSJP
Analyze Data
- My data set (adt) looks like:
## sales tss ins
## 2004-01-01 61146 -0.21 -0.07
## 2004-02-01 65230 -0.28 -0.11
## 2004-03-01 78662 -0.28 0.10
## 2004-04-01 73252 -0.31 0.05
## 2004-05-01 77491 -0.21 0.25
## 2004-06-01 75355 -0.42 0.16
- Verify if target variable (sales) is time-correlated:
##
## Box-Ljung test
##
## data: coredata(adt$sales)
## X-squared = 61.454, df = 1, p-value = 4.552e-15
- Yes, then plot it:

- Since it has large variability, log it, then plot it:

- Diff it and acf/pacf it:

- There is a seasonal component, so diff it by 12, then acf/pacf it:

- To sum up - baseline model is a seasonal AR-1 model
Model Data
- Model 1 - baseline model
- \(log(y_t) = a_1log(y_{t-1}) + a_{12}log(y_{t-12}) + e_t\)
##
## Call:
## lm(formula = y ~ lagy.1 + lagy.12, data = d[-nrow(d), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.213003 -0.038462 0.003142 0.041776 0.218068
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.72587 0.75247 0.965 0.337
## lagy.1 0.63737 0.07024 9.074 3.83e-14 ***
## lagy.12 0.29732 0.06923 4.295 4.62e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07979 on 85 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.6991, Adjusted R-squared: 0.692
## F-statistic: 98.75 on 2 and 85 DF, p-value: < 2.2e-16
- Model 2 - baseline model plus two Google Trends
- \(log(y_t) = a_1log(y_{t-1}) + a_{12}log(y_{t-12}) +tss+ins+ e_t\)
##
## Call:
## lm(formula = y ~ ., data = d[-nrow(d), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.166790 -0.044925 -0.002127 0.042199 0.171579
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.05232 0.89718 2.288 0.0247 *
## lagy.1 0.54032 0.06656 8.117 3.80e-12 ***
## lagy.12 0.28402 0.06730 4.220 6.20e-05 ***
## tss 0.31692 0.06369 4.976 3.47e-06 ***
## ins 0.36442 0.08657 4.210 6.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06924 on 83 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.7787, Adjusted R-squared: 0.7681
## F-statistic: 73.03 on 4 and 83 DF, p-value: < 2.2e-16
- Verify if Model 2 is improved significantly from Model 1:
## Analysis of Variance Table
##
## Model 1: y ~ lagy.1 + lagy.12
## Model 2: y ~ lagy.1 + lagy.12 + tss + ins
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 85 0.54115
## 2 83 0.39794 2 0.1432 14.934 2.883e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Overall fit - Out of Sample

- Evaluate models against normalized mean square errors (NMSE)
## nmse.baseline nmse.w.gtrends nmse.improved (%)
## 0.3509923 0.2743213 21.8440738
- Pick Model 2 to next step
Forecast Data
- Nowcastic forecast on May sales with Google Trends data available on May, 2012
## sales tss ins
## 2011-12-01 71674 0.31 -0.40
## 2012-01-01 62757 0.08 -0.27
## 2012-02-01 71103 0.15 -0.27
## 2012-03-01 82109 0.01 -0.18
## 2012-04-01 74632 -0.03 -0.21
## 2012-05-01 NA -0.04 -0.15
- Forecast with prediction intervals
## fit lwr upr
## 2012-05-01 74505.7 64817.91 85641.45
