Exporting local file to hive table in ODI

Exporting local file to hive table in ODI:

In this post we will discuss on loading a local file into hive using ODI inbuilt KMs. Before you start this I am assuming you have fair knowledge about HDFS, HIVE, LINUX and Map Reduce. Though you dont need to be an expert on this but having an idea will make you more comfortable.  In this post I am using Oracle Bigdata Lite VM 4.1 which comes with most of the pre installed software and hadoop ecosystem components. Click on this link to download if you have not yet done.

Also you need one virtual box to run this VM. Check here to find the software as per your OS.

Once you are logged into the VM, first create a file with some contents.

[oracle@bigdatalite ~]$ mkdir /tmp/src_files
[oracle@bigdatalite ~]$ chmod 775 /tmp/src_files
[oracle@bigdatalite ~]$ chmod 775 /tmp/src_files -R

[oracle@bigdatalite src_files]$ pwd
[oracle@bigdatalite src_files]$ cat emp
[oracle@bigdatalite src_files]$

Once this is done we have to create a table in HIVE. Go to hive prompt and write the script similar to below structure. You can see the DDL  of my table.

[oracle@bigdatalite ~]$ hive
15/06/14 16:47:13 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar!/hive-log4j.properties
hive> show create table empinfo;
CREATE TABLE `empinfo`(
`id` int,
`name` string,
`salary` string,
`location` string)
Time taken: 0.902 seconds, Fetched: 22 row(s)

Since we have a file as source follow the  same procedure as we used to do earlier to process delimited files. Create dataserver , physical schema, logical schema, model to reverse the file etc.



Initially you might see hive2 jdbc driver used for hive data servers but I have to go with weblogic.jdbc.hive.HiveDriver as I have applied the recent patch released for ODI12C Advance Bigdata Option. If you wont use the later one you will see error for all existing hive based mappings like below.

ODI-1228: Task Create target table -IKM File to Hive (LOAD DATA)- fails on the target connection Hive.
Caused By: java.sql.SQLException: Method not supported

If you have not applied the patch then go with hive2 jdbc driver.


Once hive physical and logical schemas are created you can follow the standard reverse engineering process to reverse all hive tables.



Drag the file data store which will be our source and the hive data store which will be the target for the new mapping



Select the LKM File to Hive LOAD DATA Direct. Note here that in case of Data Direct we dont need an IKM. To know more on this refer ODI 12C Bigdata Book. Also since our file is not in HDFS we have to uncheck the FILE_IS_LOCAL option as highlighted below.



Dont forget to validate the mapping by clicking on the green tick. This will show you if there are any issues with your mapping.


As you can see all the records got loaded in to our hive table. No need to write hive query language and no need to be a master in map reduce. Isn’t it easy? But wait a second!! Dont you see the header record as part of the data??. Though we have set the header to one in file data store it is still picking it up.



No worries the product team has given an option to deal with this. In such scenario we have to add below table properties to skip the first record.


Go to OVERRIDE_ROW_FORMAT option and add below texts into it.

row format delimited
fields terminated by ‘|’
lines terminated by ‘\n’
stored as textfile


Now lets drop the target table in hive prompt and execute the map again.

hive> drop table empinfo;
Time taken: 0.178 seconds

There you go. The header record is no more in the target datastore. 🙂


We can also use LKM SQL Multi-Connect and  IKM File to Hive (LOAD DATA) to perform the same as we did above.



Performance Evaluation for 6120000 records in standalone mode:




KM Combination Start Time Stop Time Total Time in Sec
LKM SQL Multi-Connect,IKM File to Hive (LOAD DATA) 6/21/2015 15:09 6/21/2015 15:10 56
LKM File to Hive LOAD DATA Direct 6/21/2015 15:43 6/21/2015 15:43 3


As you can see Data Direct is 18 times faster than LKM and IKM combination.


Thats it for today. In the coming days I will also post on how to load from RDBMS to HDFS, RDBMS to HBASE, HIVE to RDBMS and lot more. Also dont forget to check the fantastic post by Jerome on recent ODI Bigdata Patch Release.

About Bhabani 86 Articles
Bhabani has 12 plus years of experience in Data warehousing and Analytics projects that has span across multiple domains like Travel, Banking and Financial, Betting and Gaming Industries. Solution areas he focuses on designing the data warehouse and integrating it with cloud platforms like AWS or GCP. He is also a Elite level contributor at OTN forum more than 9 years. He loves to do experiment and POC on different integration tools and services. Some of his favorite skills are Redshift, Big Query, Python, Apache Airflow, Kafka, HDFS, Map Reduce ,HIVE, Habse, Sqoop, Drill, Impala.

1 Comment

  1. Hello, Im doing exactly the same steps but im getting the following error, Could you please help me out?

    ODI-1590: The execution of the script failed.
    Caused By: org.apache.bsf.BSFException: exception from Groovy: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
    _: 4: unable to resolve class org.apache.hadoop.fs.Path
    @ line 4, column 1.
    import org.apache.hadoop.fs.Path

    _: 5: unable to resolve class org.apache.hadoop.conf.Configuration
    @ line 5, column 1.
    import org.apache.hadoop.conf.Configuration

    _: 3: unable to resolve class org.apache.hadoop.fs.FileSystem
    @ line 3, column 1.
    import org.apache.hadoop.fs.FileSystem

    3 errors

    at org.codehaus.groovy.bsf.GroovyEngine.exec(GroovyEngine.java:111)
    at com.sunopsis.dwg.codeinterpretor.SnpScriptingInterpretor.execInBSFEngine(SnpScriptingInterpretor.java:353)
    at com.sunopsis.dwg.codeinterpretor.SnpScriptingInterpretor.exec(SnpScriptingInterpretor.java:216)
    at oracle.odi.runtime.agent.execution.interpreter.SessionTaskScriptingInterpretor.scripting(SessionTaskScriptingInterpretor.java:171)
    at oracle.odi.runtime.agent.execution.SessionTask.scripting(SessionTask.java:105)
    at oracle.odi.runtime.agent.execution.cmd.ScriptingExecutor.execute(ScriptingExecutor.java:49)
    at oracle.odi.runtime.agent.execution.cmd.ScriptingExecutor.execute(ScriptingExecutor.java:21)
    at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:52)
    at oracle.odi.runtime.agent.execution.SessionTask.processTask(SessionTask.java:206)
    at oracle.odi.runtime.agent.execution.SessionTask.doExecuteTask(SessionTask.java:117)
    at oracle.odi.runtime.agent.execution.AbstractSessionTask.execute(AbstractSessionTask.java:886)
    at oracle.odi.runtime.agent.execution.SessionExecutor$SerialTrain.runTasks(SessionExecutor.java:2227)
    at oracle.odi.runtime.agent.execution.SessionExecutor.executeSession(SessionExecutor.java:607)
    at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:718)
    at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:611)
    at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:203)
    at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor.doProcessStartAgentTask(TaskExecutorAgentRequestProcessor.java:800)
    at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.access$1400(StartSessRequestProcessor.java:74)
    at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$StartSessTask.doExecute(StartSessRequestProcessor.java:702)
    at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:180)
    at oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$2.run(DefaultAgentTaskExecutor.java:108)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at oracle.odi.runtime.agent.execution.job.OdiJob.call(OdiJob.java:73)
    at oracle.odi.runtime.agent.execution.job.OdiJob.call(OdiJob.java:73)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
    _: 4: unable to resolve class org.apache.hadoop.fs.Path
    @ line 4, column 1.
    import org.apache.hadoop.fs.Path

    _: 5: unable to resolve class org.apache.hadoop.conf.Configuration
    @ line 5, column 1.
    import org.apache.hadoop.conf.Configuration

    _: 3: unable to resolve class org.apache.hadoop.fs.FileSystem
    @ line 3, column 1.
    import org.apache.hadoop.fs.FileSystem

    3 errors

    at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:309)
    at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:938)
    at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:585)
    at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:534)
    at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:286)
    at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:259)
    at groovy.lang.GroovyShell.parseClass(GroovyShell.java:674)
    at groovy.lang.GroovyShell.parse(GroovyShell.java:686)
    at groovy.lang.GroovyShell.evaluate(GroovyShell.java:568)
    at groovy.lang.GroovyShell.evaluate(GroovyShell.java:608)
    at groovy.lang.GroovyShell.evaluate(GroovyShell.java:589)
    at org.codehaus.groovy.bsf.GroovyEngine.exec(GroovyEngine.java:109)
    … 27 more

Comments are closed.