Wednesday, January 6, 2016

OBIEE 12c Advanced Analytic part 3: Forecast


The formal definition is:The FORECAST function creates a time-series model of the specified measure over the series using either Exponential Smoothing or ARMIA. This function outputs a forecast for the set of periods as specified by numPeriods.

I will use an example similar to what comes with the Online documentation and then will check the formal syntax and examine it further.

 In SampleSales based analysis with Year, Qtr, Revenue columns I will add a Forecast column. 

This is it's simplified code FORECAST("Revenue", ("Year" timeYear, "Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)
This is the real function: FORECAST("Base Facts"."Revenue", ("Time"."Per Name Year" timeYear, "Time"."Per Name Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)

The result is very interesting, we can see both seasonality and trend in the forecast:

I wanted a combined line of forecast and Revenue so I used something like: IFNULL("Revenue", "FORECAST")

This way the line looks nicer:


Now it's time to see the syntax and understand what we have (I made some modifications to the formal documentation).


Syntax

FORECAST(numeric_expr, ([series]), output_column_name, options
[runtime_binded_options])])

Where:
numeric_expr indicates the measure to forecast. For example, revenue data to forecast.

series indicates the time grain at which the forecast model is built. This is a list of one or more time dimension columns. If you omit series, then the time grain is determined from the query. You can omit the series only in cases you have only one Date/Time column in analysis. Even then you have the limitation described in appendix bellow. The series must fit the date columns in the Analysis. There is a predefined set of  column aliases, each should be named as seen in the example. (In my analysis they are 2 date columns, as a result, I can't treat them as optional otherwise I will get [nQSError: 42066] The parameter of TimeDimSeqNum function is not a time level key).
Example of Forecast with omitted  series in analysis with Qtr column only: FORECAST("Base Facts"."Revenue", (),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)
In all other cases, not using aliases might result in the problem described in the Appendix bellow.Here is a list of alias values and a table of their valid formats:
  • timeYear    
  • timeQuarter  
  • timeMonth   
  • timeWeek
  • timeDay
timeYear
timeQuarter
timeMonth
timeWeek
timeDay
2010
2010 Q1
2011 / 02
2011 / 33
"27/12/2012"
10
Q1 2010
2011/02
2011/33
"12/27/2012"
‘2010’
2010/Q1
02/ 2011
33 / 2011
"2012/12/27"

Q1/2010
02/2011
33/2011
"2012/27/12"

2010 / Q1
2011 - 02
2011 – 33
"27-12-2012"

Q1 / 2010
2011-02
2011-33
"12-27-2012"

2010-Q1
02 – 2011
33 - 2011
"2012-12-27"

Q1-2010
02-2011
33-2011
"2012-27-12"

2010 - Q1
2011 – Dec
2011 Week 04
"27.12.2012"

Q1 – 2010
2011-Dec
04 Week 2011
"12.27.2012"

Q1
Dec - 2011
04 Week, 2011
"2012.12.27"

1Q
Dec-2011
2011, Week 04
"2012.27.12"

1
2011 Dec
2011 Wk 04
"21-JAN-09" ORACLE DEFAULT DATE FORMAT DD-MON-RR

'2010 Q1'
Dec 2011
2011,Wk 04



Dec
2011, Wk 04



01
04 Wk 2011




04 Wk, 2011




04 Wk,2011




Week 04




04 Week




04 Wk




Wk 04




01

 


output_column_name indicates the output column. The valid values are 'forecast', 'low', 'high', and 'predictionInterval'. Why do the "low" and "high" exist? Forecasting is a delicate business. So knowing upper and lower limits of the prediction at the given confidence level might be important.

 
options indicates a string list of name/value pairs separated by a semi-colon (;). The value can include %1 ... %N, which can be specified in runtime_binded_options.
runtime_binded_options indicates a comma separated list of runtime-binded columns and options.



Now let's review our Function with the new knowldge:
FORECAST("Revenue", ("Year" timeYear, "Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)

numeric_expr is "Revenue"
series is ("Year" timeYear, "Qtr" timeQuarter) note that adding a month column to the analysis and the series means changing to the following ("Year" timeYear, "Qtr" timeQuarter, "Month" timeMonth) each date column must be referenced with the proper name. Having non-consistency between the date columns in the analysis and in the series would result in the error: State: HY000. Code: 59144. [nQSError: 59144] Grain difference found....
There is no problem adding any other (not time) dimension to the analysis. For example


output_column_name in our case is 'forecast'

options is the string 'modelType=arima;numPeriods=%1;predictionInterval=70;', 3 where 3, at the end, is the runtime_binded_option, in this case indicating we want only 3 periods in the forecast. This is equivalent to 'modelType=arima;numPeriods=3;predictionInterval=70;'

modelType can be arima or ets. (ARIMA stands for 'Autoregressive integrated moving average', while ETS means 'Error,Trend,Seasonal'. In case you think Oracle gives you too many options, have a look here.)
Naturally, using different prediction algorithm return slightly different results:

Bellow is the same forecast function, once with Model Type Arima and once with ETS:



numPeriods is the number (Integer) of periods of forecast to predict.

Removing the numPeriods value (please note the ';' sign should remain) will return values for all possible future values in the database:
FORECAST("Base Facts"."Revenue", ("Time"."Per Name Year" timeYear, "Time"."Per Name Qtr" timeQuarter),'forecast','modelType=arima;;predictionInterval=70;')


predictionInterval is the confidence for the prediction. Values are 0 to 100 (not 100 itself, it returns nQSError: 43119 Confidence limit out of range), where higher values specify higher confidence.

Few more options:



Option Name
Description
Values
useBoxCox
If TRUE, then use Box-Cox transformation.
TRUE
FALSE
lambdaValue
The Box-Cox transformation parameter.
Ignore if NULL or when useBoxCox is FALSE.
Otherwise the data is transformed before the model is estimated.

modelParamIC
The information criterion (IC) to be used in the model selection.
"ic_auto"
"ic_aicc"
"ic_bic"
"ic_auto" (this is the default)


Some extra options are relevant only for the ETS model:

Option Name
Description
Values
trendDamp
If TRUE, then use damped trend. If NULL, then try both damped and non-damped trend and choose the one that is optimal.

errorType

additive("A")
multiplicative("M")
automatically selected ("Z")
trendType

none("N")
additive("A")
multiplicative("M")
automatically selected("Z")
seasonType

none("N")
additive("A")
multiplicative("M")
automatically selected("Z")


This post is getting too long.
To finish it, we might gain additional insight by looking at the internal documentation of the file obiee.TimeSeriesForecast.R, that comes with the installation (slightly shorter version), to discover some secrets like frequency.

# Function : obiee.Forecast.freeTimeFormat
#            STL Forecasting Model which applies ETS or Arima
#           Forecasts obtained by STL decomposition and then apply different models (ETS, Arima, ..) to seasonality adjusted data
#
# Arguments :
#
# dat                              dataframe          dataframe with date and target
#                                                     timeYear     : Can be YYYY or YY
#                                                     timeQuarter  : Can be YYYY Q%, Q% YYYY, YYYY / Q%, YY Q%, Q%
#                                                     timeMonth    : Can be
#                                                     timeWeek
#                                                     timeDay
#                                                     target
# numPeriods                numeric            number of periods to forecast
# predictionLevel          numeric            confidence level in percentage for prediction, default is 95.
# modelType                 string             "ets","arima"
# frequency                   string             "freq_hourly", "freq_daily",    "freq_weekdays", "freq_weekly",
#                                                     "freq_monthly", "freq_quarterly", "freq_yearly"
#                                                      Note: the frequency argument could be useful to derive the time date format,
#                                                          but it is not used in creating the time series object.
#                                                          for time series, we use obiee.Forecast.findFrequency() function to figure out the
#                                                          time series frequency.
# useBoxCox                        logical        if TRUE, use Box-Cox transformation. Otherwise, Box-Cox transformation is not used.default FALSE
# lambdaValue       numeric or NULL    Box-Cox transformation parameter. Ignored if NULL or when useBoxCox is FALSE.  Otherwise, data
#                                                     transformed before model is estimated.default NULL
#                                                     Note: we are not sure what to do when useBoxCox is true and lambdaValue = NULL
#                                                          currently it is
# trendDamp         logical          a parameter for ETS model. if TRUE, use damped trend.  if NULL, try both damped and non-damped trend
#                                                     and choose optimal. default NULL
# errorType                                 "A"/"M"/"Z"        a parameter for ETS model. additive("A"), multiplicative("M"),
#                                                     or automatically selected("Z"). default "Z"
# trendType                                "N"/"A"/"M"/"Z"    a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
#                                                     or automatically selected("Z"). default "Z"
# seasonType                             "N"/"A"/"M"/"Z"    a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
#                                                      or automatically selected("Z"). default "Z"
# modelParamIC    string             Information criterion (ic) to be used in model selection.
#                                                     Values are: "ic_auto", "ic_aicc", "ic_bic", default "ic_auto"
#
#
# Output Data Frame :
#
# Day or Month or other time unit:      Sequence of future time periods to forecast; the column name depends on the date format
# forecast                                            Point forecasts as a time series
# low                                                  Lower limits for prediction intervals at a given confidence level
# high                                                Upper limits for prediction intervals at a given confidence level
# confidenceLevel                             Confidence Level
#
# Assumptions:
# 1. assume only one column in the dataframe is target and it is non factor
# 2. assume the date column(s) are factors in dataframe
#######################################################################################################



Appendix


In the Sample Virtual Box the forecast Function sample page has 2 prompt (presentation variables):
numperiods - the number of periods to  predict
genTimeSeq -Generate future months even if they don't exist in warehouse (Yes / No)
The function is: forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month" @{(genTimeSeq=='Yes')? 'timeMonth':''}), 'forecast',  'numPeriods=@{numperiods}{5}' )

the interesting part here is the condition on the Presentation Variable: @{(genTimeSeq=='Yes')? 'timeMonth':''} If it equals 'Yes', return 'timeMonth' and then the forecast will automatically generate the missing month (2013/05 & 2013/06 in the example bellow). If not return '' and the both forecast month missing in the database will be aggregated to null line (lower picture).

Lets remove the Variable for clarity.

In the date dimension they are only 2013/01 - 2013/04.

This is  forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month" timeMonth), 'forecast',  'numPeriods=6 )  


This is the same without timeMonth Alias:  forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month"), 'forecast',  'numPeriods=6)  




4 comments:

  1. I admired those who has able to create a blog as wonderful as this! You are truly a hard working person. Keep up the good work and keep on posting.

    www.imarksweb.org

    ReplyDelete
  2. can someone please explain me the use of prediction interval variable is doing ..and what oes it mean to the business user..?

    ReplyDelete
  3. Awesome, Great work.
    I have a question: Where to use IFNULL("Revenue", "FORECAST")?
    Thanks

    ReplyDelete
    Replies
    1. Ahmed, instead of the Revenue, or as a new column you will show in the graph. The main point is to have one line. Since the forecast still exists on the graph, it will have a different color.

      Delete