Monday, July 8, 2013

OBIEE 11.1.1.7 - Working with Hadoop / Hive (Real-World Case)

Dec-2015 remark:  Please note this is an old post. For better results with Hadoop, Impala, Hive... please use OBIEE 11.1.1.9 or later.


A colleague of mine, Tovit, has done a POC with OBIEE 11.1.1.7, on top of Hadoop (few Petabytes), using Hive. While she was working, I took the role of "fly on the wall" and did some documentation.
There are 3 good sources for relevant information at the moment:
1. From Support.oracle.com, the note OBIEE 11g: Using Oracle Hadoop ODBC Driver with BI Administration Tool [ID 1520733.1].
2. Oracle Documentation here.
Since I mostly write this blog for myself, I wanted it to be slightly more detailed than that blog.

The guideline is to use OBIEE on Linux if you want to work with Hadoop. Why? A good reason is "because the people that developed the product tells you so". I can add one more, we tried it on Windows box as well and Linux OBIEE works much better.

The support note "describes how to use the Oracle Apache Hadoop Hive ODBC Driver with the Oracle BI Administration tool for the purpose of creating Oracle BI metadata from Hadoop Data sources" and attach 32-bit and 64-bit versions of the driver. So download the ODBC drivers for Hive from this note and install them on your Administrator PC.
Remember to run the install.bat as Windows Administrator (otherwise you can't have access to registry). So the result is:

Run the Data sources(ODBC) from Control Panel /  Administrative Tools on Windows. Under System DSN Add a new data source of Apache Hadoop Hive.
Fill relevant data, while Port and Database refer to Hive definitions:
 We didn't change the defaults for the other tabs.

Now from the client OBIEE Administration you can import the data structure using the Hadoop Hive ODBC definition (File -> Import Metadata -> ODBC 3.5).

After the import change the Database definition to Apache Hadoop.


Leave the connection pool to be ODBC 2.0.

 We had one problem at this moment. For some reason the a column that was used for partitioning, was not imported. So we had to add it manually.
One more advise, all the columns should be nullable.

At this point you should be able to see the data (right click, View Data). Next is standard OBIEE repository work.

Now we move to the OBIEE Linux server.
In 11.1.1.7 the Hive ODBC by DataDirect are already installed.
You have to modify the odbc.ini file, this one:
oracleHome/instances/instance1/bifoundation/OracleBIApplication/coreapplication/setup/odbc.ini

As always the server ODBC, should fit the Administrator ODBC name.

We added the following, non black part (replace XXXXXX, YYYYYY, ZZZZZZ with your values):

[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
OBIEELNX=Oracle 7.1 Apache Hive Wire Protocol

[OBIEELNX]
QEWSD=2456476
Driver=/opt/oracle/obiee/Oracle_BI1/common/ODBC/Merant/7.0.1/lib/ARhive27.so
Description=DataDirect 7.1 Apache Hive Wire Protocol
ArraySize=16384
Database=default
DefaultLongDataBuffLen=1024
EnableDescribeParam=0
HostName=XXXXXXXXX
LoginTimeout=30
LogonID=ZZZZZZZZZZ
MaxVarcharSize=2147483647
Password=YYYYYYYYYY
PortNumber=10001
RemoveColumnQualifiers=0
StringDescribeType=12
TransactionMode=0
UseCurrentSchema=0

See another example in the documentation here.

Locate the opmn.xml file 
oracleHome/instances/instance1/config/OPMN/opmn/opmn.xml

In the file, Locate the coreapplication_obis1 tag, and within that tag, locate the LD_LIBRARY_PATH variable. Ensure that the LD_LIBRARY_PATH variable includes the path to the DataDirect 7.0.1 driver, namely ORACLE_HOME/common/ODBC/Merant/7.0.1/lib.

In the same opmn.xml file, either above or below the LD_LIBRARY_PATH variable, create the HADOOP_DLL variable to point to the DataDirect 7.0.1 driver.

For example:
<variable id="HADOOP_DLL"
value="ORACLE_HOME/common/ODBC/Merant/7.0.1/lib/ARhive27.so"/>


Save and close the file.

Restart OPMN.

Create a soft link libARicu27.so pointing to libARicu26.so.
For example:
cd /usr/Oracle_BI1/common/ODBC/Merant/7.0.1/lib/
ln -s libARicu26.so libARicu27.so
 


In Oracle documentation there is a bit misleading  part, at the moment, starting with Acquiring Windows Driver to Enable Data Import Using Administration Tool and till the end of the subject (the limitation part). Since we did it already with the drivers provided by the support, we don't need to do it as described in the document.



Now you are ready to go.


When the queries run you can clearly see the steps in the Session Manager:
Executing, Preparing, Fetching, fetched, Preparing.
 
We couldn't cancel a query during the Executing stage.

It seems Oracle has done a good job here and the results seems encouraging.
 

10 comments:

  1. This excellent website are actually amazing as well as considering the fact that packed with standard rock. Do stating and also this For certain I will completely enable your blog.Hadoop Training in hyderabad

    ReplyDelete
  2. Nice piece of article you have shared here, my dream of becoming a hadoop professional become true with the help of Big Data Training Chennai, keep up your good work of sharing quality articles.

    hadoop training in velachery|hadoop training velachery|Big Data Training Chennai

    ReplyDelete
  3. Interesting article. This article helps to learn a lot of information and shows that the performance and processing importance. Thanks for info! VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai | VMWare course chennai

    ReplyDelete
  4. Thank you so much for sharing this worthwhile to spent time on. You are running a really awesome blog. Keep up this good work Big Data Hadoop Training in Chennai

    ReplyDelete
  5. Subjects like data warehousing, cloud based hadoop, are well learned when one attends hadoop online training one on one with the instructor

    ReplyDelete
  6. Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training Chennai | Big Data Training | Big Data Course in Chennai

    ReplyDelete
  7. I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

    Software testing training in chennai | Software testing course | Manual testing training in Chennai

    ReplyDelete
  8. There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
    big data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete