Working with MapR-FS and Hive using Talend

Working with MapR-FS and Hive using Talend:

Talend Open Studio for Data Integration Version: 6.3.1
Java: 1.8
OS: Windows 8

This article will not cover the installation and setup of Talend Open Studio. The assumption is talend already installed and is working correctly. For details on how to install and configure Talend Open Studio see this post.

In our previous post we discussed about MapR cluster configuration in Talend and verified that we can connect to name node, resource manager without any issues. Today we will use the same connection to connect MapRFS to do some file operation and then we will create a hive table and load some data from one table to the other.

Right click on the cluster and create new HDFS and HIVE connection as shown below.

This is how my metadata looks like.

Now we will create a job and use the connection we did above. Lets use tHDFSPut component and fill the required parameters. This will copy the file from local system to HDFS. If you dont know the HDFS dir structure execute hadoop fs -ls /

[mapr@maprdemo ~]$ hadoop fs ls /
Found 10 items
drwxrxrx mapr mapr 0 20170421 10:44 /apps
drwxrxrx mapr mapr 6 20170421 18:55 /data
drwxrxrx root root 4 20170421 18:55 /drillbetademo
drwxrxrx mapr mapr 0 20170421 10:43 /hbase
drwxrxrx mapr mapr 1 20170902 08:42 /home
drwxrxrx mapr mapr 0 20170421 10:47 /opt
drwxrxrx root root 3 20170421 18:57 /tables
drwxrwxrwx mapr mapr 0 20170902 08:22 /tmp
drwxrxrx mapr mapr 9 20170902 10:59 /user
drwxrxrx mapr mapr 1 20170421 10:44 /var

In the above screenshot I am copying local file D:/Thrash/employee.txt into HDFS directory and renaming to emp_hdfs.txt.

Now lets execute it. Oh Oh. It failed.

[ERROR]: org.apache.hadoop.util.Shell Failed to locate the winutils binary in the hadoop binary path

You remember in the previous post I had mentioned we will configure something during execution? That what we will do now. Go to run job properties and click on advance setting. Give the path to winutils location.

Now we are good to go. After execution you can see the file emp_hdfs.txt in HDFS.

[mapr@maprdemo ~]$ hadoop fs ls /user/bhabani
Found 3 items
drwxrxrx mapr mapr 0 20170902 13:33 /user/bhabani/destination
rwxrxrx 1 root root 17256960 20170902 23:07 /user/bhabani/emp_hdfs.txt
rwxrwxrwx 1 mapr mapr 53 20170901 23:28 /user/bhabani/testfile.txt
[mapr@maprdemo ~]$

Now lets do something Hive. We will use tHiveRow to create a table and load some data into it. Drag tHiveRow two times and connect them. Change the property type as Repository and select the hive connection we had created before. It will fill the connection parameters automatically.

Now lets add some queries in to the component. I have already one table called orders_subset having some rows. We will load this data to the newly created table.

Now execute the job and verify if data is loaded or not.

Thats all for today. In my next post we will load data into tables having storage type as parquet. As always if you face any issues do let me know. I will be happy to assist you.


About the author

Bhabani( - Currently Bhabani is working as Sr Development Engineer at Harman International. He has good expertise on Oracle, Oracle Data Integrator, Pervasive Data Integrator, MSBI, Talend and Java. He is also contributing in ODI-OTN forum for last 5 years. He is from India. If you want to reach him then please visit contact us page. If you have any doubts or concerns on the above article, please put your question here. Dw Team will try to respond it as soon as possible. Also dont forget to provide your comments / suggestions / feedback for further improvement. Thanks for your time.

Similar Posts

Leave a reply


Are you a human? *