Friday, November 11, 2016

OBIEE - Selecting date range by selecting a column

Customers often work with specific date ranges: "last 3 days", "last week", "this month", "last 7 working days"... While it's easy to create a filter that does this and reuse it, they prefer an easier method.
Adding those option in the tool by default is problematic, since users always invent new range needs. We need something the local IT can manage without too much work for each new idea.
 
I covered how to do it with prompts in the post: "OBIEE Prompts - Last Year, Last Month and Custom Dates", but the request is to give easy access to date range selection in Answers (and VA) as well.
Sorry, it's going to be a long post.

My basic assumption is: you have a date dimension based on dates table. I will cover the following options:
  • Reusing a filter.
  • Adding date range columns to date table.
  • Create a date_range table and joining it to date dimension.   
  • Having multiple data sources on the same physical table with various "where" conditions in the content as a date dimension and specific column.

In all examples I will use 3 ranges "last 3 days", "last month", "this year". Assuming today is 11-Nov-2016:
last 3 days is 10-Nov-2016, 9-Nov-2016, 8-Nov-2016
last month is all dates from 11-Oct-2016 till 10-Nov-2016
this year are all the dates from 1-Jan-2016 till 11-Nov-2016. 

I'll describe the pros and cons of each method with the following criteria:
  • Ease of use for customer.
  • Requires RPD change for each additional date range.
  • Requires daily ETL.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days).
  • Works with VA as well or Answers only.
  • Has performance impact.
I use very simple model with measure value 1 in each day of fact.

Reusing Filter

I can create a filter from within an analysis or from menu New / Filter.

 For last 3 days I create a filter "DATE_D"."DATEVALUE" BETWEEN TIMESTAMPADD(SQL_TSI_DAY, -3,CURRENT_DATE) AND TIMESTAMPADD(SQL_TSI_DAY, -1,CURRENT_DATE)
I can save the filter for reuse

 or save the entire analysis.
To use the filter, the user has to select it from the filters list of the Catalog (left panel):





Some users find it easier to use the analysis itself as filter, they believe they have better control this way:
 In a similar manner I can create the other date range filter / Analysis:
Last Month: "DATE_D"."DATEVALUE" BETWEEN TIMESTAMPADD(SQL_TSI_MONTH, -1,CURRENT_DATE) AND TIMESTAMPADD(SQL_TSI_DAY, -1,CURRENT_DATE)
 This Year: "DATE_D"."DATEVALUE" BETWEEN TIMESTAMPADD(SQL_TSI_DAY, (1-DAYOFYEAR(CURRENT_DATE)) ,CURRENT_DATE) AND CURRENT_DATE
Only the analysis based filter let us create reverse selection:






Summary:

The best option for IT and OK but not so great for Users.
  • Ease of use for User is OK but not great.
  • Requires RPD change for each additional date range?  NO.
  • Requires daily ETL? NO.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? Yes, with filter based on analysis only.
  • Works with VA? NO.
  • Has performance impact? NO.



Adding date range columns to date table

This option requires change in the dates table. I'll add 3 columns (last_3_days, last_month, this_year). For each column I have to update daily the Boolean value (in my case Yes/No).


This is the Subject area:


The user has very simple task; Filtering Yes/No on the column:





The reverse option is easy to create:

It works with VA as well:


Summary:

The worst option for IT and great for Users.
  • Ease of use for User? YES.
  • Requires RPD change for each additional date range?  YES.
  • Requires daily ETL? YES.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? YES.
  • Works with VA? YES.
  • Has performance impact? NO.





Create a date_range table with line for each range and joining it to date dimension

The simple option here is to create a table with 3 columns: Range_name, From_date, Till_date and join them to the Date dimension. This sort of join has bad impact on performance. The table also requires ETL for updating the date_range table. We can have a work around for the ETL by creating a dynamic view (Oracle SQL code):
create or REPLACE view date_range_line as
select 'Last 3 days' as Range, trunc(sysdate-3) as from_date, trunc(sysdate-1) as till_date from dual
Union ALL
select 'Last month' as Range, trunc(add_months(sysdate,-1)) as from_date, trunc(sysdate-1) as till_date from dual
Union ALL
select 'This Year' as Range, TRUNC(TO_DATE(sysdate),'YEAR') as from_date, trunc(sysdate) as till_date from dual
This is easy to update for each change or new requirement, in the future. 
So I created it as a View, imported into repository and joined it to date table. 

"ORCL".""."SH"."DATE_RANGE_LINE"."FROM_DATE" <=  "ORCL".""."SH"."DATE_D"."DATEVALUE"   AND   "ORCL".""."SH"."DATE_RANGE_LINE"."TILL_DATE" >= "ORCL".""."SH"."DATE_D"."DATEVALUE"




Now it's working:
Unfortunately, the reverse option returns wrong results (the last 3 days were not removed):


(Actually it can be done using minus option that is under the union option in the criteria, but that's a bit advanced).

For performance we see the select statement from the log:
WITH 
SAWITH0 AS (select sum(T909.MES) as c1
from 
     DATE_RANGE_LINE T1138,
     DATE_D T895,
     TEST_FACT T909
where  ( T895.DATEKEY = T909.DATEKEY and T1138.RANGE = 'Last 3 days' 
and T895.DATEVALUE between T1138.FROM_DATE and T1138.TILL_DATE ) )
select D1.c1 as c1, D1.c2 as c2 from ( select distinct 0 as c1,
     D1.c1 as c2
from 
     SAWITH0 D1 ) D1 where rownum <= 65001
As expected we have 3 tables joined instead of 2.

Summary:

Good option for everyone, if you don't mind the performance problem and the wrong results of reverse or multiple range selection.
  • Ease of use for User? YES.
  • Requires RPD change for each additional date range?  NO.
  • Requires daily ETL? NO (for the view option).
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? NO.
  • Works with VA? YES.
  • Has performance impact? YES.But can be solved with driving table as described bellow, in the update.

For this option I also considered creating a date_range table with a line for each date and range combination and joining it to date dimension as driving table. 
This is ETL intensive option where for example for last 3 days I have 3 rows of data, the range column is "last 3 days" and the date column has each of the 3 days that fit the range. For last month, up to 31 rows... We can see that dates such as '10-Nov-2016' appear both in 'last 3 days' and 'last Month' :

This way the join with date table is simple.



UPDATE 

To solve the performance problem we can use a feature named "driving table" in OBIEE. The idea of driving table described here and here. This way OBIEE runs a separate query for the date_range table and places the result into the date table filter. Since the date ranges are usually small (if you have a year, then you should change MAX_PARAMETERS_PER_DRIVE_JOIN parameter). Unfortunately since driving table makes it own optimization, it didn't act in my tests. 
OBIEE prefers to use driving table only with "cross-database joins ".
If you can't beat them, join them.To overcame the optimizer I duplicated my physical data source (now it's cross-database join):

And used the date_range table from the second source (ORCL#1). Set this date_range as driving table in the join at the business model:
Now the performance is fine!

This is from the log:

select T1287.RANGE as c1,
     T1287.DATES as c2
from 
     DATE_RANGE T1287
where  ( T1287.RANGE = 'last 3 days' ) 
order by c2

]]

select  /*+ no_merge */  sum(T909.MES) as c1,
     T909.DATEKEY as c2,
     T895.DATEVALUE as c3
from 
     TEST_FACT T909,
     DATE_D T895
where  ( T895.DATEKEY = T909.DATEKEY and (T895.DATEVALUE in 
(:PARAM1, :PARAM2, :PARAM3, :PARAM4, :PARAM5, :PARAM6, :PARAM7, :PARAM8, :PARAM9, 
:PARAM10, :PARAM11, :PARAM12, :PARAM13, :PARAM14, :PARAM15, :PARAM16, :PARAM17, 
...
:PARAM100)) ) 
group by T895.DATEVALUE, T909.DATEKEY 
group by T895.DATEVALUE, T909.DATEKEY
order by c3




Having multiple data sources on the same physical table with various "where" conditions in the content as a date dimension and specific column

This is a nice option for performance and demands no ETL.
For each range I duplicate the date data source and add a column that is specific for this data source. Selecting that column or filtering on it, enables the filter.

The example:

In the Business Model, in the date dimension table I duplicate the data source (it is still the same physical date_d table):


Rename the duplicate data surce to "last 3 days" and use a where condition, very similar to the one in the first example of "Reusing a filter" in Content Tab:

 "ORCL".""."SH"."DATE_D"."DATEVALUE"  BETWEEN TIMESTAMPADD(SQL_TSI_DAY, -3,CURRENT_DATE) AND TIMESTAMPADD(SQL_TSI_DAY, -1,CURRENT_DATE)

Now I add a new Logical data column to the Date, rename it

 
Set the column source to be from physical mapping from the "last 3 days" data source. It is critical that no other data source will be mapped to this column.


I enter the data source and Map this column:

We don't need a Physical Table, I just write 'Last 3 Days' (or whatever I want) in the Expression. We do need to mark "Show unmapped columns".

The order of data sources is important. We want the default date to be without constraints and place that data source first (by default OBIEE starts with the first data source and goes top down). If you want extra caution, you can work with data source priority group as well.  
Oracle help: "To assign priority group numbers, rank your logical table sources in numeric order, with 0 being the highest-priority source. You can assign the same number to multiple sources. For example, you can have two logical table sources in priority group 0, two logical table sources in priority group 1, and so on. Often, only two priority groups are necessary (0 and 1)."

Here in a similar way I add "Last Month" data source add it's condition in content tab and add the "Last Month column" for that data source:


I map the column to physical source in "Last Month" data source only. In a similar way you can see "Last 3 days" has no mapping in the "Last Month" data source.

After I complete the set this is what I have:
 
When working with Analysis, not using any of the range column returns the usual data:

In our case all the dates:

Adding the "Last 3 Days" column, activates the filtering. Same would happen if it's added hidden or used as filter.

Attempt of reverse option wouldn't work.
In this case when I attempt to add the filter "Last 3 Days is not equal / is not in Last 3 Days" it returns empty set as expected (once I use the "last 3 days" data set, the "Last 3 Days" column value is always 'Last 3 Days').


Summary:

  • Ease of use for User? YES.
  • Requires RPD change for each additional date range?  YES, but it's rather easy.
  • Requires daily ETL? NO.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? NO.
  • Works with VA? YES.
  • Has performance impact? NO.



Final Summary:

The easiest option for IT is the first, "Reusing a filter". It's also the worst for the User.
The best for the user seems to be the "Adding date range columns to date table" option, it's also the worst for IT.
 You can consider the other options as well.