Dec-2015 remark: Please note this is an old post. For better results with Hadoop, Impala, Hive... please use OBIEE 11.1.1.9 or later.
A colleague of mine, Tovit, has done a POC with OBIEE 11.1.1.7, on top of Hadoop (few Petabytes), using Hive. While she was working, I took the role of "fly on the wall" and did some documentation.
There are 3 good sources for relevant information at the moment:
1. From Support.oracle.com, the note OBIEE 11g: Using Oracle Hadoop ODBC Driver with BI Administration Tool [ID 1520733.1].
2. Oracle Documentation
here.
Since I mostly write this blog for myself, I wanted it to be slightly more detailed than that blog.
The guideline is to use
OBIEE on Linux if you want to work with Hadoop. Why? A good reason is
"because the people that developed the product tells you so". I can add
one more, we tried it on Windows box as well and Linux OBIEE works much
better.
The support note "describes how to use the Oracle Apache Hadoop Hive ODBC Driver with the Oracle BI Administration tool for the purpose of creating Oracle BI metadata from Hadoop Data sources" and attach 32-bit and 64-bit versions of the driver. So download the ODBC drivers for Hive from this note and install them on your Administrator PC.
Remember to run the install.bat as Windows Administrator (otherwise you can't have access to registry). So the result is:
Run the Data sources(ODBC) from Control Panel / Administrative Tools on Windows. Under System DSN Add a new data source of Apache Hadoop Hive.
Fill relevant data, while Port and Database refer to Hive definitions:
We didn't change the defaults for the other tabs.
Now from the client OBIEE Administration you can import the data structure using the Hadoop Hive ODBC definition (File -> Import Metadata -> ODBC 3.5).
After the import change the Database definition to Apache Hadoop.
Leave the connection pool to be ODBC 2.0.
We had one problem at this moment. For some reason the a column that was used for partitioning, was not imported. So we had to add it manually.
One more advise, all the columns should be nullable.
At this point you should be able to see the data (right click, View Data). Next is standard OBIEE repository work.
Now we move to the OBIEE Linux server.
In 11.1.1.7 the Hive ODBC by DataDirect are already installed.
You have to modify the odbc.ini file, this one:
oracleHome/instances/instance1/bifoundation/OracleBIApplication/coreapplication/setup/odbc.ini
As always the server ODBC, should fit the Administrator ODBC name.
We added the following, non black part (replace XXXXXX, YYYYYY, ZZZZZZ with your values):
[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
OBIEELNX=Oracle 7.1 Apache Hive Wire Protocol
[OBIEELNX]
QEWSD=2456476
Driver=/opt/oracle/obiee/Oracle_BI1/common/ODBC/Merant/7.0.1/lib/ARhive27.so
Description=DataDirect 7.1 Apache Hive Wire Protocol
ArraySize=16384
Database=default
DefaultLongDataBuffLen=1024
EnableDescribeParam=0
HostName=XXXXXXXXX
LoginTimeout=30
LogonID=ZZZZZZZZZZ
MaxVarcharSize=2147483647
Password=YYYYYYYYYY
PortNumber=10001
RemoveColumnQualifiers=0
StringDescribeType=12
TransactionMode=0
UseCurrentSchema=0
See another example in the documentation
here.
Locate the opmn.xml file
oracleHome/instances/instance1/config/OPMN/opmn/opmn.xml
In the file, Locate the coreapplication_obis1 tag, and within that tag, locate the LD_LIBRARY_PATH variable. Ensure that the LD_LIBRARY_PATH variable includes the path to the DataDirect 7.0.1 driver, namely ORACLE_HOME/common/ODBC/Merant/7.0.1/lib.
In the same opmn.xml file, either above or below the LD_LIBRARY_PATH variable, create the HADOOP_DLL variable to point to the DataDirect 7.0.1 driver.
For example:
<variable id="HADOOP_DLL"
value="ORACLE_HOME/common/ODBC/Merant/7.0.1/lib/ARhive27.so"/>
Save and close the file.
Restart OPMN.
Create a soft link libARicu27.so pointing to libARicu26.so.
For example:
cd /usr/Oracle_BI1/common/ODBC/Merant/7.0.1/lib/
ln -s libARicu26.so libARicu27.so
In
Oracle documentation there is a bit misleading part, at the moment, starting with
Acquiring Windows Driver to Enable Data Import Using Administration Tool and till the end of the subject (the limitation part). Since we did it already with the drivers provided by the support, we don't need to do it as described in the document.
Now you are ready to go.
When the queries run you can clearly see the steps in the Session Manager:
Executing, Preparing, Fetching, fetched, Preparing.
We couldn't cancel a query during the Executing stage.
It seems Oracle has done a good job here and the results seems encouraging.