Why more information does not always give a better Forecast - CCmath
Why more information does not always give a better Forecast

Why more information does not always give a better Forecast

Call center forecasting is a fascinating field where science and business intersect to create stimulating challenges. However, business requests sometimes contradict scientific reasoning such as using external variables to predict call volumes.

As a forecasting specialist, I often receive an Excel sheet from the sales department containing future demand forecasts and I am asked to use this information to provide call volume forecasts for the coming weeks. Considering future demand to predict future calls seems, at first glance, entirely reasonable; after all, if sales increase, more customers will try to contact.

The problem is that we want to predict calls using the same trend and seasonality information twice; because trend and seasonality share the same trends in both call volumes and sales. In fact, if we observe a positive trend in historical sales, then the same tendency is normally observed also in volumes.

Figure 1 Daily calls and sales (from 01/01/2013 to 01/08/2014) relative to the same product.
The two series clearly share a similar pattern.

In other words, we are using a prediction α (the demand forecast) to forecast β (the call volume forecast) where α was predicted using the same information already contained in β.

In statistics, we call this “multicollinearity”. In this situation, we have a predictor that can be predicted by another (or others) with relatively high precision. Consider:

Where Y is the outcome, Betas are coefficients, X1, X2, and X3 are the predictors.

If X3 can be predicted by X1, X2 we have a case of multicollinearity.

Similarly, consider:

 

 

The formula indicates that the number of calls at time t can be predicted based on trend, seasonality, and demand. However, the demand at time t may itself be predicted by trend and seasonality.

Therefore, the “real” value of β3 is actually zero.

But what are the consequences of including an unnecessary predictor in the model? It depends.

If there is a lot of data available, consequences are marginal (except for having done unnecessary work). In fact, if we had infinite availability of data to train our model, the estimation of the betas would be extremely close to their “real” values. Therefore, β3 would be really close to zero.

Conversely, If the available data were too limited (which is often the case) we would estimate a bad model; in fact, the estimation of β3 would not be necessarily close to its real value and lead to a wrong forecast (in statistics this phenomenon is called “overfitting”).

A different situation arises when we know in advance the future number of sales; in other words, if future demand is a certain event and not a prediction. In this situation, we “must” use sales to make our prediction; however, we must still avoid multicollinearity excluding seasonality and trend from the model.

To conclude, consider future sales as a predictor of call volumes either if it contains (useful) information that is not already included in historical call volumes data or if future sales are a certain event and not a prediction. In all other cases, including future demand to predict volumes is only harmful.

Giuseppe is a forecasting specialist and developer at CCmath.
To find out more about CCmath and its products, see CCmath.com.
To learn more about the background and practice of WFM, see WFMacademy.CCmath.com.
The author thanks Ger Koole for his feedback.