## Wednesday, January 6, 2016

### OBIEE 12c Advanced Analytic part 3: Forecast

This is the third part of OBIEE 12c Advanced Analytic:

(for 3-7 you need Enabling R and the relevant Analytics functions on OBIEE 12c )

The Forecast function is hiding with the Time Series Calculations (Unlike the Analytics). It makes sense since it includes time-series component.

The formal definition is:The FORECAST function creates a time-series model of the specified measure over the series using either Exponential Smoothing or ARMIA. This function outputs a forecast for the set of periods as specified by numPeriods.

I will use an example similar to what comes with the Online documentation and then will check the formal syntax and examine it further.

In SampleSales based analysis with Year, Qtr, Revenue columns I will add a Forecast column.

This is it's simplified code FORECAST("Revenue", ("Year" timeYear, "Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)
This is the real function: FORECAST("Base Facts"."Revenue", ("Time"."Per Name Year" timeYear, "Time"."Per Name Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)

The result is very interesting, we can see both seasonality and trend in the forecast:

I wanted a combined line of forecast and Revenue so I used something like: IFNULL("Revenue", "FORECAST")

This way the line looks nicer:

Now it's time to see the syntax and understand what we have (I made some modifications to the formal documentation).

#### Syntax

`FORECAST(numeric_expr, ([series]), output_column_name, options, `
```[runtime_binded_options])])
```

Where:
`numeric_expr` indicates the measure to forecast. For example, revenue data to forecast.

`series` indicates the time grain at which the forecast model is built. This is a list of one or more time dimension columns. If you omit `series`, then the time grain is determined from the query. You can omit the `series` only in cases you have only one Date/Time column in analysis. Even then you have the limitation described in appendix bellow. The `series` must fit the date columns in the Analysis. There is a predefined set of  column aliases, each should be named as seen in the example. (In my analysis they are 2 date columns, as a result, I can't treat them as optional otherwise I will get [nQSError: 42066] The parameter of TimeDimSeqNum function is not a time level key).
Example of Forecast with omitted  `series` in analysis with Qtr column only: FORECAST("Base Facts"."Revenue", (),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)
In all other cases, not using aliases might result in the problem described in the Appendix bellow.Here is a list of alias values and a table of their valid formats:
• timeYear
• timeQuarter
• timeMonth
• timeWeek
• timeDay
``` timeYear timeQuarter timeMonth timeWeek timeDay 2010 2010 Q1 2011 / 02 2011 / 33 "27/12/2012" 10 Q1 2010 2011/02 2011/33 "12/27/2012" ‘2010’ 2010/Q1 02/ 2011 33 / 2011 "2012/12/27" Q1/2010 02/2011 33/2011 "2012/27/12" 2010 / Q1 2011 - 02 2011 – 33 "27-12-2012" Q1 / 2010 2011-02 2011-33 "12-27-2012" 2010-Q1 02 – 2011 33 - 2011 "2012-12-27" Q1-2010 02-2011 33-2011 "2012-27-12" 2010 - Q1 2011 – Dec 2011 Week 04 "27.12.2012" Q1 – 2010 2011-Dec 04 Week 2011 "12.27.2012" Q1 Dec - 2011 04 Week, 2011 "2012.12.27" 1Q Dec-2011 2011, Week 04 "2012.27.12" 1 2011 Dec 2011 Wk 04 "21-JAN-09" ORACLE DEFAULT DATE FORMAT DD-MON-RR '2010 Q1' Dec 2011 2011,Wk 04 Dec 2011, Wk 04 01 04 Wk 2011 04 Wk, 2011 04 Wk,2011 Week 04 04 Week 04 Wk Wk 04 01  ```

`output_column_name` indicates the output column. The valid values are 'forecast', 'low', 'high', and 'predictionInterval'. Why do the "low" and "high" exist? Forecasting is a delicate business. So knowing upper and lower limits of the prediction at the given confidence level might be important.

`options` indicates a string list of name/value pairs separated by a semi-colon (;). The value can include %1 ... %N, which can be specified in `runtime_binded_options`.
`runtime_binded_options` indicates a comma separated list of runtime-binded columns and options.

Now let's review our Function with the new knowldge:
FORECAST("Revenue", ("Year" timeYear, "Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)

`numeric_expr` is "Revenue"
`series` is ("Year" timeYear, "Qtr" timeQuarter) note that adding a month column to the analysis and the series means changing to the following ("Year" timeYear, "Qtr" timeQuarter, "Month" timeMonth) each date column must be referenced with the proper name. Having non-consistency between the date columns in the analysis and in the series would result in the error: State: HY000. Code: 59144. [nQSError: 59144] Grain difference found....
There is no problem adding any other (not time) dimension to the analysis. For example

`output_column_name` in our case is 'forecast'

`options` is the string 'modelType=arima;numPeriods=%1;predictionInterval=70;', 3 where 3, at the end, is the `runtime_binded_option`, in this case indicating we want only 3 periods in the forecast. This is equivalent to 'modelType=arima;numPeriods=3;predictionInterval=70;'

modelType can be arima or ets. (ARIMA stands for 'Autoregressive integrated moving average', while ETS means 'Error,Trend,Seasonal'. In case you think Oracle gives you too many options, have a look here.)
Naturally, using different prediction algorithm return slightly different results:

Bellow is the same forecast function, once with Model Type Arima and once with ETS:

numPeriods is the number (Integer) of periods of forecast to predict.

Removing the numPeriods value (please note the ';' sign should remain) will return values for all possible future values in the database:
FORECAST("Base Facts"."Revenue", ("Time"."Per Name Year" timeYear, "Time"."Per Name Qtr" timeQuarter),'forecast','modelType=arima;;predictionInterval=70;')

predictionInterval is the confidence for the prediction. Values are 0 to 100 (not 100 itself, it returns nQSError: 43119 Confidence limit out of range), where higher values specify higher confidence.

Few more options:

 Option Name Description Values useBoxCox If TRUE, then use Box-Cox transformation. TRUE FALSE lambdaValue The Box-Cox transformation parameter. Ignore if NULL or when useBoxCox is FALSE. Otherwise the data is transformed before the model is estimated. modelParamIC The information criterion (IC) to be used in the model selection. "ic_auto" "ic_aicc" "ic_bic" "ic_auto" (this is the default)

Some extra options are relevant only for the ETS model:

 Option Name Description Values trendDamp If TRUE, then use damped trend. If NULL, then try both damped and non-damped trend and choose the one that is optimal. errorType additive("A") multiplicative("M") automatically selected ("Z") trendType none("N") additive("A") multiplicative("M") automatically selected("Z") seasonType none("N") additive("A") multiplicative("M") automatically selected("Z")

This post is getting too long.
To finish it, we might gain additional insight by looking at the internal documentation of the file obiee.TimeSeriesForecast.R, that comes with the installation (slightly shorter version), to discover some secrets like frequency.

# Function : obiee.Forecast.freeTimeFormat
#            STL Forecasting Model which applies ETS or Arima
#           Forecasts obtained by STL decomposition and then apply different models (ETS, Arima, ..) to seasonality adjusted data
#
# Arguments :
#
# dat                              dataframe          dataframe with date and target
#                                                     timeYear     : Can be YYYY or YY
#                                                     timeQuarter  : Can be YYYY Q%, Q% YYYY, YYYY / Q%, YY Q%, Q%
#                                                     timeMonth    : Can be
#                                                     timeWeek
#                                                     timeDay
#                                                     target
# numPeriods                numeric            number of periods to forecast
# predictionLevel          numeric            confidence level in percentage for prediction, default is 95.
# modelType                 string             "ets","arima"
# frequency                   string             "freq_hourly", "freq_daily",    "freq_weekdays", "freq_weekly",
#                                                     "freq_monthly", "freq_quarterly", "freq_yearly"
#                                                      Note: the frequency argument could be useful to derive the time date format,
#                                                          but it is not used in creating the time series object.
#                                                          for time series, we use obiee.Forecast.findFrequency() function to figure out the
#                                                          time series frequency.
# useBoxCox                        logical        if TRUE, use Box-Cox transformation. Otherwise, Box-Cox transformation is not used.default FALSE
# lambdaValue       numeric or NULL    Box-Cox transformation parameter. Ignored if NULL or when useBoxCox is FALSE.  Otherwise, data
#                                                     transformed before model is estimated.default NULL
#                                                     Note: we are not sure what to do when useBoxCox is true and lambdaValue = NULL
#                                                          currently it is
# trendDamp         logical          a parameter for ETS model. if TRUE, use damped trend.  if NULL, try both damped and non-damped trend
#                                                     and choose optimal. default NULL
# errorType                                 "A"/"M"/"Z"        a parameter for ETS model. additive("A"), multiplicative("M"),
#                                                     or automatically selected("Z"). default "Z"
# trendType                                "N"/"A"/"M"/"Z"    a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
#                                                     or automatically selected("Z"). default "Z"
# seasonType                             "N"/"A"/"M"/"Z"    a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
#                                                      or automatically selected("Z"). default "Z"
# modelParamIC    string             Information criterion (ic) to be used in model selection.
#                                                     Values are: "ic_auto", "ic_aicc", "ic_bic", default "ic_auto"
#
#
# Output Data Frame :
#
# Day or Month or other time unit:      Sequence of future time periods to forecast; the column name depends on the date format
# forecast                                            Point forecasts as a time series
# low                                                  Lower limits for prediction intervals at a given confidence level
# high                                                Upper limits for prediction intervals at a given confidence level
# confidenceLevel                             Confidence Level
#
# Assumptions:
# 1. assume only one column in the dataframe is target and it is non factor
# 2. assume the date column(s) are factors in dataframe
#######################################################################################################

#### Appendix

In the Sample Virtual Box the forecast Function sample page has 2 prompt (presentation variables):
numperiods - the number of periods to  predict
genTimeSeq -Generate future months even if they don't exist in warehouse (Yes / No)
The function is: forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month" @{(genTimeSeq=='Yes')? 'timeMonth':''}), 'forecast',  'numPeriods=@{numperiods}{5}' )

the interesting part here is the condition on the Presentation Variable: @{(genTimeSeq=='Yes')? 'timeMonth':''} If it equals 'Yes', return 'timeMonth' and then the forecast will automatically generate the missing month (2013/05 & 2013/06 in the example bellow). If not return '' and the both forecast month missing in the database will be aggregated to null line (lower picture).

Lets remove the Variable for clarity.

In the date dimension they are only 2013/01 - 2013/04.

This is  forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month" timeMonth), 'forecast',  'numPeriods=6 )

This is the same without timeMonth Alias:  forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month"), 'forecast',  'numPeriods=6)