Thursday, January 14, 2016

OBIEE 12c Advanced Analytic part 6: Regr

This is the sixth part of OBIEE 12c Advanced Analytic:

  1. OBIEE 12c Advanced Analytics Functions part 1. Introduction & Trendline
  2. OBIEE 12c Advanced Analytic part 2: BIN and WIDTH_BUCKET
  3. OBIEE 12c Advanced Analytic part 3: Forecast
  4. OBIEE 12c Advanced Analytic part 4: Cluster
  5. OBIEE 12c Advanced Analytic part 5: Outlier
  6. OBIEE 12c Advanced Analytic part 6: Regression (this one)
  7. OBIEE 12c Advanced Analytic part 7: EVALUATE_SCRIPT
 (for 3-7 you need Enabling R and the relevant Analytics functions on OBIEE 12c )



Linear Regression is a basic option of Advanced Analytics. I once wrote about achieving the regression line in OBIEE using the Oracle DB capabilities at "OBIEE and Linear Regression with Oracle DB".

Oracle documentation describes the REGR function: The REGR function fits a linear model and returns the fitted values or model. You can use this function to fit a linear curve on two measures.


Lets start with the simple example from the online documentation:
Analysis with columns Product Type, Brand Revenue and Discount Amount from Sample Sales. With a regression column. The simplified version is: REGR("Revenue", ("Discount Amount"), ("Product Type", "Brand"), 'fitted', '')
 
The full formula:  REGR("Base Facts"."Revenue", ("Base Facts"."Discount Amount"), ("Products"."Product Type", "Products"."Brand"), 'fitted', '')

So we have a "fitted" regression of Revenue vs. Discount Amount, where each point is combination of Product Type and Brand.

Lets see the syntax  documentation and do few more things:

Syntax

REGR(y_axis_measure_expr, (x_axis_expr), 
(category_expr1, ..., category_exprN), output_column_name,
options, [runtime_binded_options])
Where:
y_axis_measure_expr indicates the measure for which the regression model is to be computed.
x_axis_measure_expr indicates the measure used to determine the regression model for the y_axis_measure_expr.
category_expr1, ..., category_exprN indicates the dimension/dimension attributes used to determine the category for which the regression model for the y_axis_measure_expr is computed. One or up to five dimensions/dimension attributes can be provided as category columns.
output_column_name indicates the output column name for regression. Valid values are 'fitted', 'intercept', or 'modelDescription'.

output_column_name options explanation:
fitted - returns the points on regression line (y=ax+b) 
intercept - the intercept point with the zero on x axis (b from y=ax+b)
modelDescription - the Model in JSON format.

options indicates a string list of name/value pairs separated by a semi-colon (;). The value can include %1 ... %N, which can be specified using runtime_binded_options.
runtime_binded_options indicates an optional comma-separated (,) list of runtime binded columns or literal expressions.



From my personal testing, All the options in the documentation (algorithm, targetNames, showModelDescription) seems irrelevant at the moment. Mostly because they have only 1 relevant default value.


It seems that adding the intercept or modelDescription output to the analysis hides the measures. 



In case you are interested, the REGR("Base Facts"."Revenue", ("Base Facts"."Discount Amount"), ("Products"."Product Type", "Products"."Brand"), 'modelDescription', '') function returned the following JSON:


"{
 "call": {
 "": "lm",
"formula": "formula",
"data": "dat"
},
"terms": [
 "~",
"target",
"arg1"
],
"residuals": {
 "1":  41031,
"2":  88411,
"3": -1.3425e+05,
"4":  81339,
"5": -76497,
"6": -8371.3,
"7":  15573,
"8": -17995,
"9": -20080,
"10": -61277,
"11":  92117
},
"coefficients": [ {
 "Estimate":  10364,
"Std. Error":  54600,
"t value": 0.18981,
"Pr(>|t|)": 0.85367
},
{
 "Estimate":  29.39,
"Std. Error": 3.2705,
"t value": 8.9865,
"Pr(>|t|)": 8.6434e-06
} ],
"aliased": {
 "(Intercept)": false,
"arg1": false
},
"sigma":  76935,
"df": [ 2, 9, 2 ],
"r.squared": 0.89973,
"adj.r.squared": 0.88859,
"fstatistic": {
 "value": 80.757,
"numdf":      1,
"dendf":      9
},
"cov.unscaled": [ {
 "(Intercept)": 0.50366,
"arg1": -2.7311e-05
},
{
 "(Intercept)": -2.7311e-05,
"arg1": 1.8071e-09
} ]
}"



Usually we can add extra columns that are not in the REGR function to the analysis.
For example, similar to previous function, where I replaced Brand with Product and added the Department (from another dimension) to analysis.
REGR("Base Facts"."Revenue", ("Base Facts"."Discount Amount"), ("Products"."Product", "Products"."Product Type"), 'fitted', '')

The result is fine:





No comments:

Post a Comment