This is the third part of OBIEE 12c Advanced Analytic:
The Forecast function is hiding with the Time Series Calculations (Unlike the Analytics). It makes sense since it includes time-series component.
- OBIEE 12c Advanced Analytics Functions part 1. Introduction & Trendline
- OBIEE 12c Advanced Analytic part 2: BIN and WIDTH_BUCKET
- OBIEE 12c Advanced Analytic part 3: Forecast (this one)
- OBIEE 12c Advanced Analytic part 4: Cluster
- OBIEE 12c Advanced Analytic part 5: Outlier
- OBIEE 12c Advanced Analytic part 6: Regression
- OBIEE 12c Advanced Analytic part 7: EVALUATE_SCRIPT
The Forecast function is hiding with the Time Series Calculations (Unlike the Analytics). It makes sense since it includes time-series component.
The formal definition is:The FORECAST function creates a time-series model of the specified
measure over the series using either Exponential Smoothing or ARMIA.
This function outputs a forecast for the set of periods as specified by
numPeriods.
I will use an example similar to what comes with the Online documentation and then will check the formal syntax and examine it further.
In SampleSales based analysis with Year, Qtr, Revenue columns I will add a Forecast column.
This is it's simplified code FORECAST("Revenue", ("Year" timeYear, "Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3).
This is the real function: FORECAST("Base Facts"."Revenue", ("Time"."Per Name Year" timeYear, "Time"."Per Name Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)
The result is very interesting, we can see both seasonality and trend in the forecast:
I wanted a combined line of forecast and Revenue so I used something like: IFNULL("Revenue", "FORECAST")
This way the line looks nicer:
Now it's time to see the syntax and understand what we have (I made some modifications to the formal documentation).
Where:
If you omit You can omit the
Example of Forecast with omitted
In all other cases, not using aliases might result in the problem described in the Appendix bellow.Here is a list of alias values and a table of their valid formats:
Now let's review our Function with the new knowldge:
FORECAST("Revenue", ("Year" timeYear, "Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)
Syntax
FORECAST(numeric_expr, ([series]), output_column_name, options,
[runtime_binded_options])])
Where:
numeric_expr
indicates the measure to forecast. For example, revenue data to forecast.series
indicates the time grain at which the forecast model is built. This is a
list of one or more time dimension columns.series
, then the time grain is determined from the query.series
only in cases you have only one Date/Time column in analysis. Even then you have the limitation described in appendix bellow. The series
must fit the date columns in the Analysis. There is a predefined set of column aliases, each should be named as seen in the example. (In my analysis they are 2 date columns, as a result, I can't treat them as optional otherwise I will get [nQSError: 42066] The parameter of TimeDimSeqNum function is not a time level key).Example of Forecast with omitted
series
in analysis with Qtr column only: FORECAST("Base Facts"."Revenue", (),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)In all other cases, not using aliases might result in the problem described in the Appendix bellow.Here is a list of alias values and a table of their valid formats:
- timeYear
- timeQuarter
- timeMonth
- timeWeek
- timeDay
timeYear
timeQuarter
timeMonth
timeWeek
timeDay
2010
2010 Q1
2011 / 02
2011 / 33
"27/12/2012"
10
Q1 2010
2011/02
2011/33
"12/27/2012"
‘2010’
2010/Q1
02/ 2011
33 / 2011
"2012/12/27"
Q1/2010
02/2011
33/2011
"2012/27/12"
2010 / Q1
2011 - 02
2011 – 33
"27-12-2012"
Q1 / 2010
2011-02
2011-33
"12-27-2012"
2010-Q1
02 – 2011
33 - 2011
"2012-12-27"
Q1-2010
02-2011
33-2011
"2012-27-12"
2010 - Q1
2011 – Dec
2011 Week 04
"27.12.2012"
Q1 – 2010
2011-Dec
04 Week 2011
"12.27.2012"
Q1
Dec - 2011
04 Week, 2011
"2012.12.27"
1Q
Dec-2011
2011, Week 04
"2012.27.12"
1
2011 Dec
2011 Wk 04
"21-JAN-09" ORACLE DEFAULT DATE FORMAT DD-MON-RR
'2010 Q1'
Dec 2011
2011,Wk 04
Dec
2011, Wk 04
01
04 Wk 2011
04 Wk, 2011
04 Wk,2011
Week 04
04 Week
04 Wk
Wk 04
01
output_column_name
indicates the output column. The valid values are 'forecast', 'low', 'high', and 'predictionInterval'. Why do the "low" and "high" exist? Forecasting is a delicate business. So knowing upper and lower limits of the prediction at the given confidence level might be important.options
indicates a string list of name/value pairs separated by a semi-colon
(;). The value can include %1 ... %N, which can be specified in runtime_binded_options
.runtime_binded_options
indicates a comma separated list of runtime-binded columns and options.Now let's review our Function with the new knowldge:
FORECAST("Revenue", ("Year" timeYear, "Qtr" timeQuarter),'forecast','modelType=arima;numPeriods=%1;predictionInterval=70;', 3)
numeric_expr
is "Revenue"series
is ("Year" timeYear, "Qtr" timeQuarter) note that adding a month column to the analysis and the series means changing to the following ("Year" timeYear, "Qtr" timeQuarter, "Month" timeMonth) each date column must be referenced with the proper name. Having non-consistency between the date columns in the analysis and in the series would result in the error: State: HY000. Code: 59144. [nQSError: 59144] Grain difference found....
There is no problem adding any other (not time) dimension to the analysis. For example
modelType can be arima or ets. (ARIMA stands for 'Autoregressive integrated moving average', while ETS means 'Error,Trend,Seasonal'. In case you think Oracle gives you too many options, have a look here.)
Naturally, using different prediction algorithm return slightly different results:
Bellow is the same forecast function, once with Model Type Arima and once with ETS:
numPeriods is the number (Integer) of periods of forecast to predict.
Removing the numPeriods value (please note the ';' sign should remain) will return values for all possible future values in the database:
FORECAST("Base Facts"."Revenue", ("Time"."Per Name Year" timeYear, "Time"."Per Name Qtr" timeQuarter),'forecast','modelType=arima;;predictionInterval=70;')
predictionInterval is the confidence for the prediction. Values are 0 to 100 (not 100 itself, it returns nQSError: 43119 Confidence limit out of range), where higher values specify higher confidence.
Few more options:
Some extra options are relevant only for the ETS model:
This post is getting too long.
To finish it, we might gain additional insight by looking at the internal documentation of the file obiee.TimeSeriesForecast.R, that comes with the installation (slightly shorter version), to discover some secrets like frequency.
# Function : obiee.Forecast.freeTimeFormat
# STL Forecasting Model which applies ETS or Arima
# Forecasts obtained by STL decomposition and then apply different models (ETS, Arima, ..) to seasonality adjusted data
#
# Arguments :
#
# dat dataframe dataframe with date and target
# timeYear : Can be YYYY or YY
# timeQuarter : Can be YYYY Q%, Q% YYYY, YYYY / Q%, YY Q%, Q%
# timeMonth : Can be
# timeWeek
# timeDay
# target
# numPeriods numeric number of periods to forecast
# predictionLevel numeric confidence level in percentage for prediction, default is 95.
# modelType string "ets","arima"
# frequency string "freq_hourly", "freq_daily", "freq_weekdays", "freq_weekly",
# "freq_monthly", "freq_quarterly", "freq_yearly"
# Note: the frequency argument could be useful to derive the time date format,
# but it is not used in creating the time series object.
# for time series, we use obiee.Forecast.findFrequency() function to figure out the
# time series frequency.
# useBoxCox logical if TRUE, use Box-Cox transformation. Otherwise, Box-Cox transformation is not used.default FALSE
# lambdaValue numeric or NULL Box-Cox transformation parameter. Ignored if NULL or when useBoxCox is FALSE. Otherwise, data
# transformed before model is estimated.default NULL
# Note: we are not sure what to do when useBoxCox is true and lambdaValue = NULL
# currently it is
# trendDamp logical a parameter for ETS model. if TRUE, use damped trend. if NULL, try both damped and non-damped trend
# and choose optimal. default NULL
# errorType "A"/"M"/"Z" a parameter for ETS model. additive("A"), multiplicative("M"),
# or automatically selected("Z"). default "Z"
# trendType "N"/"A"/"M"/"Z" a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
# or automatically selected("Z"). default "Z"
# seasonType "N"/"A"/"M"/"Z" a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
# or automatically selected("Z"). default "Z"
# modelParamIC string Information criterion (ic) to be used in model selection.
# Values are: "ic_auto", "ic_aicc", "ic_bic", default "ic_auto"
#
#
# Output Data Frame :
#
# Day or Month or other time unit: Sequence of future time periods to forecast; the column name depends on the date format
# forecast Point forecasts as a time series
# low Lower limits for prediction intervals at a given confidence level
# high Upper limits for prediction intervals at a given confidence level
# confidenceLevel Confidence Level
#
# Assumptions:
# 1. assume only one column in the dataframe is target and it is non factor
# 2. assume the date column(s) are factors in dataframe
#######################################################################################################
In the Sample Virtual Box the forecast Function sample page has 2 prompt (presentation variables):
numperiods - the number of periods to predict
output_column_name
in our case is 'forecast'options
is the string 'modelType=arima;numPeriods=%1;predictionInterval=70;', 3 where 3, at the end, is the runtime_binded_option
, in this case indicating we want only 3 periods in the forecast. This is equivalent to 'modelType=arima;numPeriods=3;predictionInterval=70;' modelType can be arima or ets. (ARIMA stands for 'Autoregressive integrated moving average', while ETS means 'Error,Trend,Seasonal'. In case you think Oracle gives you too many options, have a look here.)
Naturally, using different prediction algorithm return slightly different results:
Bellow is the same forecast function, once with Model Type Arima and once with ETS:
numPeriods is the number (Integer) of periods of forecast to predict.
Removing the numPeriods value (please note the ';' sign should remain) will return values for all possible future values in the database:
FORECAST("Base Facts"."Revenue", ("Time"."Per Name Year" timeYear, "Time"."Per Name Qtr" timeQuarter),'forecast','modelType=arima;;predictionInterval=70;')
predictionInterval is the confidence for the prediction. Values are 0 to 100 (not 100 itself, it returns nQSError: 43119 Confidence limit out of range), where higher values specify higher confidence.
Few more options:
Option Name
|
Description
|
Values
|
useBoxCox
|
If TRUE, then use Box-Cox
transformation.
|
TRUE
FALSE
|
lambdaValue
|
The Box-Cox transformation
parameter.
Ignore if NULL or when useBoxCox
is FALSE.
Otherwise the data is transformed
before the model is estimated.
|
|
modelParamIC
|
The information criterion (IC) to
be used in the model selection.
|
"ic_auto"
"ic_aicc"
"ic_bic"
"ic_auto" (this is the
default)
|
Some extra options are relevant only for the ETS model:
Option Name
|
Description
|
Values
|
trendDamp
|
If TRUE, then use damped trend. If
NULL, then try both damped and non-damped trend and choose the one that is
optimal.
|
|
errorType
|
additive("A")
multiplicative("M")
automatically selected
("Z")
|
|
trendType
|
none("N")
additive("A")
multiplicative("M")
automatically
selected("Z")
|
|
seasonType
|
none("N")
additive("A")
multiplicative("M")
automatically
selected("Z")
|
This post is getting too long.
To finish it, we might gain additional insight by looking at the internal documentation of the file obiee.TimeSeriesForecast.R, that comes with the installation (slightly shorter version), to discover some secrets like frequency.
# Function : obiee.Forecast.freeTimeFormat
# STL Forecasting Model which applies ETS or Arima
# Forecasts obtained by STL decomposition and then apply different models (ETS, Arima, ..) to seasonality adjusted data
#
# Arguments :
#
# dat dataframe dataframe with date and target
# timeYear : Can be YYYY or YY
# timeQuarter : Can be YYYY Q%, Q% YYYY, YYYY / Q%, YY Q%, Q%
# timeMonth : Can be
# timeWeek
# timeDay
# target
# numPeriods numeric number of periods to forecast
# predictionLevel numeric confidence level in percentage for prediction, default is 95.
# modelType string "ets","arima"
# frequency string "freq_hourly", "freq_daily", "freq_weekdays", "freq_weekly",
# "freq_monthly", "freq_quarterly", "freq_yearly"
# Note: the frequency argument could be useful to derive the time date format,
# but it is not used in creating the time series object.
# for time series, we use obiee.Forecast.findFrequency() function to figure out the
# time series frequency.
# useBoxCox logical if TRUE, use Box-Cox transformation. Otherwise, Box-Cox transformation is not used.default FALSE
# lambdaValue numeric or NULL Box-Cox transformation parameter. Ignored if NULL or when useBoxCox is FALSE. Otherwise, data
# transformed before model is estimated.default NULL
# Note: we are not sure what to do when useBoxCox is true and lambdaValue = NULL
# currently it is
# trendDamp logical a parameter for ETS model. if TRUE, use damped trend. if NULL, try both damped and non-damped trend
# and choose optimal. default NULL
# errorType "A"/"M"/"Z" a parameter for ETS model. additive("A"), multiplicative("M"),
# or automatically selected("Z"). default "Z"
# trendType "N"/"A"/"M"/"Z" a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
# or automatically selected("Z"). default "Z"
# seasonType "N"/"A"/"M"/"Z" a parameter for ETS model. none("N"), additive("A"), multiplicative("M"),
# or automatically selected("Z"). default "Z"
# modelParamIC string Information criterion (ic) to be used in model selection.
# Values are: "ic_auto", "ic_aicc", "ic_bic", default "ic_auto"
#
#
# Output Data Frame :
#
# Day or Month or other time unit: Sequence of future time periods to forecast; the column name depends on the date format
# forecast Point forecasts as a time series
# low Lower limits for prediction intervals at a given confidence level
# high Upper limits for prediction intervals at a given confidence level
# confidenceLevel Confidence Level
#
# Assumptions:
# 1. assume only one column in the dataframe is target and it is non factor
# 2. assume the date column(s) are factors in dataframe
#######################################################################################################
Appendix
In the Sample Virtual Box the forecast Function sample page has 2 prompt (presentation variables):
numperiods - the number of periods to predict
genTimeSeq -Generate future months even if they don't exist in warehouse (Yes / No)
The function is: forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month" @{(genTimeSeq=='Yes')? 'timeMonth':''}), 'forecast', 'numPeriods=@{numperiods}{5}' )
the interesting part here is the condition on the Presentation Variable: @{(genTimeSeq=='Yes')? 'timeMonth':''} If it equals 'Yes', return 'timeMonth' and then the forecast will automatically generate the missing month (2013/05 & 2013/06 in the example bellow). If not return '' and the both forecast month missing in the database will be aggregated to null line (lower picture).
Lets remove the Variable for clarity.
In the date dimension they are only 2013/01 - 2013/04.
This is forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month" timeMonth), 'forecast', 'numPeriods=6 )
This is the same without timeMonth Alias: forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month"), 'forecast', 'numPeriods=6)
the interesting part here is the condition on the Presentation Variable: @{(genTimeSeq=='Yes')? 'timeMonth':''} If it equals 'Yes', return 'timeMonth' and then the forecast will automatically generate the missing month (2013/05 & 2013/06 in the example bellow). If not return '' and the both forecast month missing in the database will be aggregated to null line (lower picture).
Lets remove the Variable for clarity.
In the date dimension they are only 2013/01 - 2013/04.
This is forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month" timeMonth), 'forecast', 'numPeriods=6 )
This is the same without timeMonth Alias: forecast("Base Facts"."1- Revenue", ("Time"."T02 Per Name Month"), 'forecast', 'numPeriods=6)
can someone please explain me the use of prediction interval variable is doing ..and what oes it mean to the business user..?
ReplyDeleteAwesome, Great work.
ReplyDeleteI have a question: Where to use IFNULL("Revenue", "FORECAST")?
Thanks
Ahmed, instead of the Revenue, or as a new column you will show in the graph. The main point is to have one line. Since the forecast still exists on the graph, it will have a different color.
DeleteHi, Is it possible to pass multiple numPeriod values dynamically based on the rows present in the report? For example I have two rows for which I need to pass two different numPeriod values, is this achievable?
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteCannot render any of the external data since there is no join pathway to (tmpQueryScope://nQS_1291078800/outputs). (HY000)
ReplyDeleteHii
how can i solve this error if any body know how to solve this so kindly guideline me
Cannot render any of the external data since there is no join pathway to (tmpQueryScope://nQS_2910320069/outputs). (HY000)
ReplyDeleteI want to get prediction to the period that I already have data for it, how can I do it? This is a way to understand or check which forecast function is better, with witch params.
ReplyDelete