Statistical Engine Tuning for Improving Forecast Accuracy

Background image: Engine Tuning Landing Page

“I don’t understand this!” 

“Why does the forecast suddenly jump?” 

“Is this the best we can get?”  

Some companies implement Oracle demand management solutions with close to zero effort invested in tuning the statistical engine. Given the amount of thinking and effort put into all other configuration settings such as measures/series, worksheets/page layouts with tables and graphs, integrations and process flows, this is a potential source of serious post-purchase dissatisfaction.

We know that poor forecasts lead to many inefficiencies including out-of-stocks, material shortages, higher inventories, and over-compensation via safety stock policies.

But we should also acknowledge that poor statistical forecasts drive planners to manual intervention in the form of overrides (or they may even disregard the statistical forecast totally). This is an inefficient way of demand planning when the number of combinations (products x locations) is often overwhelming, and we should usually expect statistical forecasts to be sufficient, or even optimal, for most combinations. It is too much to expect planners to keep on top of changing demand patterns and to be modifying forecasts every forecast cycle. But that is where statistical forecasting shines since it utilizes the power of computing and time-series algorithms that keep learning from the most recent history and causal data, each cycle. It is a matter of due diligence and honoring planners to ensure that statistical forecasts are as accurate as possible, so that fewer manual overrides are necessary. They can instead spend their time reviewing forecast exceptions, focusing on high-value items, while ensuring causal factor data is properly maintained.

We should also note that manual overrides often introduce user bias (usually towards over-forecasting). If these forecasts (and their errors) are used for safety stock planning, the financial impact is worsened if you have safety stock recommendations that were driven by over-forecasting demand in the first place.

This brings us to Oracle’s statistical forecasting engine.

While the UI for Oracle Demand Management Cloud (ODMC) is quite different from its predecessor, Oracle Demantra, one of the things ODMC did inherit was the Demantra forecasting engine (with some deprecations and some subsequent new features). This sophisticated “machine learning” engine is one of the key components and strengths of both applications.

The impressiveness of the engine should drive us to want to exploit it to the fullest! It has a range of levers to influence the shape of forecasts and optimize them for each individual client’s demand patterns:

  • Ability to model causal factors that influence demand such as trend, seasonality, special holidays, events, end-of-quarter patterns and so on (and for Demantra, promotion uplifts and activity shapes). Some causal factors are pre-configured (but can typically be changed) and others can be designed and created for a particular client.
  • Forecast tree configuration – refers to the levels of aggregation of history data that the engine considers when generating forecasts
  • Selection from around 15 different engine models – various flavors of Regression (these use the causal factors), plus Crostons, Holt, etc.
  • Settings of many business-related statistical parameters, such as those that impact forecast validation, outlier detection, intermittency detection, naïve forecasts, etc

Installations of ODMC and Demantra come with out-of-the-box settings for all these levers other than the forecast tree, which requires a custom setup.

How useful are the default settings?

Solution Credibility

Not only does a good forecast improve forecast accuracy, it drives solution credibility. Clients expect forecasts to make sense given historical data patterns. Sometimes expectations are unrealistic, and this should be addressed. For example, some clients expect that patterns of spikes and troughs in history ought to be replicated in the forecast as an accurate prediction of future demand when these variations in demand are purely random; no causal factor could be designed to explain them or predict them. But often the client’s expectations are very realistic because we know the source of the variations – for example, on-promotion demand, repeating events or holidays, day-of-week for daily demand, etc. Or perhaps in the current engine configuration trends are not being picked up adequately, or monthly seasonality is not sensitive enough, or the change in recent level of demand is not being picked up quickly enough (or the opposite of these examples!) All such problems can potentially be addressed by engine tuning.

Many years ago, I had one Demantra client express to me “maybe we wasted a million dollars,” due to the grossly inadequate statistical forecasts they were seeing. This was before they experienced the positive impact of engine tuning on their Demantra forecasts! They were running with an out-of-the-box engine configuration, but they had drivers of demand around Back to School, Black Friday, Christmas, and New Year, which were not being modeled. Ironically, because of the flexibility of the Oracle engine, the default settings are never optimal.

Statistical Engine Tuning Viewpoint Graphic

What is Engine Tuning?

Engine tuning addresses configuration of all the levers we mentioned above, given:

Your historical demand profiles:

  • i.e. seasonality, trends (short and long term), intermittency, stability or lack of it, life-cycle length, holidays that impact demand, other causality (e.g. promotions, end-of-quarter)
  • Historical disruptions like product availability shortages, COVID-19 impact

Taking into consideration your knowledge of your own markets:

  • Is there cannibalism between new/old products or between various locations (e.g. ship-froms)
  • Will historical trends continue?

And your own forecasting preferences:

  • Conservativeness: should trend/level react slowly or quickly to changes in demand?
  • At what base level should forecasts usually be generated?
  • When should combinations be considered “dead” (due to no recent demand), so that no further statistical forecast is generated?
  • How aggressively should we try to detect outliers, if at all?

One of the key features of the Oracle engine that helps produce more stable forecasts (we want to avoid rapidly changing forecasts from cycle to cycle since this adds confusion to supply planning and decreases forecast credibility), is the blending of output from various selected engine models. This is differentiated from the more common software approach of choosing a single “best fit” model, per combination.

Part of the typical tuning effort is geared towards choosing appropriate engine models for the length and time-granularity of history, and parameters governing model validation rules.

The end game is to improve forecast accuracy in the system without requiring excessive user intervention.

Measuring Improvement

Forecast runs need to be evaluated based on various forecast accuracy metrics.

Kalypso’s general approach during an engine tuning exercise is to set the "last historical date" backwards so that some history is “held out” from the engine, and thus a portion of the statistical forecast will overlap with real history. This last date, the gap (if any) and period of overlap between forecast and history is agreed with the client based on considerations including current practices, variability of demand, time granularity of the system/plan and typical lead times of supply.

The first step is to establish a “baseline forecast” for evaluating improvements. This is usually a forecast using existing engine configurations but could also be directing the engine to create a naïve forecast of some description.

Then for each forecast iteration (typically after a change in engine configuration settings, or entering new causal factor data), the forecast and history data for the selected time period are loaded to our Excel template (usually at the aggregation level of the bottom of the forecast tree), where measures of forecast error are calculated for the total data set: Total Percent Error, "Capped Weighted MAPE" (MAPE = Mean Absolute Percent Error), and “Bias". Various filters can be used to cull out unhelpful records. We also decide whether history should be pure actuals or should include overrides. There are many complexities to consider, so we do not treat the metrics as “absolutely true,” it is the change in the metrics from run to run that is important. However, we do want to glean from the results a feeling for how much randomness is inherently in the demand, and thus the limits of statistical forecasting accuracy.

Similar analysis can be performed post-tuning, with no portion of the recent history being “held out,” using a lagged forecast series/measure vs actuals. This is an important practice, since it will indicate whether forecast error is drifting higher over time, which should trigger more investigation/ tuning.

Measuring Override Effectiveness

For clients who like to regularly override statistical forecasts, we recommend comparing the forecast accuracy of the statistical forecasts with the overridden forecasts. We want to know whether the overrides are adding value or not – that is, improving forecast accuracy and/or bias… all in the name of continuous improvement!

Measuring Against Other Forecasts

If there is a supplementary forecast available (for example, a forecast generated by another software package or department), it could be useful to use the same criteria to examine its forecast accuracy.

In a recent tuning exercise, we compared the forecasts being generated by a third-party sales and promotion management system to the tuned ODMC statistical forecast. Some of the data from this system was being used as a causal factor in ODMC so it was important to understand how it was being generated, and the client wanted to know whether the forecasts it generated warranted preference to the ODMC forecast for use in the Oracle Supply Planning Cloud application. The third-party system generated a total forecast error of 30.5%, whereas the ODMC error was much lower at 15.5% (for like products, at the item x banner level, for a single test week).

It turned out to be an easy decision.

Wrap Up

Kalypso has a careful, consultative approach to engine tuning. We have conducted around 40 tuning exercises in the last seven years, with every single engagement showing empirical improvements. Our engine tuning engagements typically drive weighted forecast error improvements between 10-20% (relative to the existing error percent), with some much better than this. And secondarily, where current bias is poor to extremely poor, we often see vast improvement in the bias numbers due to systemic issues driven by one or two settings.

Engine tuning is a standard part of Kalypso’s implementation methodology for new Oracle demand management projects. We recommend tuning before Go-Live, if possible, between SIT and UAT. It is unproductive to tune while there are known issues with historical data. But engine tuning often uncovers previously unknown history issues or at least raises awareness of the impact of anomalies. Of course, for any new project, initial results and impressions count in the business. A well-understood, well-tuned statistical forecast creates a good impression and fuels user acceptance. But if that “before go-live" ideal is missed, then it ought to be conducted as soon as possible.

Markets change over time, and different influences on demand become apparent. For this reason, and especially if statistical forecast accuracy begins to wane, repeat tuning exercises ought to be conducted – say, every two years. This helps to keep the demand planning system fresh, and the consultative exercise will help any new demand planners become more productive and confident with their Oracle system (and less likely to regress back to Excel!)

We value input from the businesses we engage with, and work hard to transfer knowledge to users to make the Oracle engine a little less of a black box. Clients often gain new insights into their own data as they are forensically analyzed to help produce the best forecast. Engaging in a Kalypso Statistical Engine Tuning exercise with Kalypso will make a significant impact on the effectiveness and credibility of your Oracle Demand Management application, and your commercial bottom line.

AC Kalypso profile 2022 Ettrema Andrew Calder
Andrew Calder
Technical Manager