Working with MapR-FS and Hive using Talend:
Talend Open Studio for Data Integration Version: 6.3.1
OS: Windows 8
This article will not cover the installation and setup of Talend Open Studio. The assumption is talend already installed and is working correctly. For details on how to install and configure Talend Open Studio see this post.
In our previous post we discussed about MapR cluster configuration in Talend and verified that we can connect to name node, resource manager without any issues. Today we will use the same connection to connect MapRFS to do some file operation and then we will create a hive table and load some data from one table to the other.
Right click on the cluster and create new HDFS and HIVE connection as shown below.
This is how my metadata looks like.
Now we will create a job and use the connection we did above. Lets use tHDFSPut component and fill the required parameters. This will copy the file from local system to HDFS. If you dont know the HDFS dir structure execute hadoop fs -ls /
[mapr@maprdemo ~]$ hadoop fs -ls / Found 10 items drwxr-xr-x - mapr mapr 0 2017-04-21 10:44 /apps drwxr-xr-x - mapr mapr 6 2017-04-21 18:55 /data drwxr-xr-x - root root 4 2017-04-21 18:55 /drill-beta-demo drwxr-xr-x - mapr mapr 0 2017-04-21 10:43 /hbase drwxr-xr-x - mapr mapr 1 2017-09-02 08:42 /home drwxr-xr-x - mapr mapr 0 2017-04-21 10:47 /opt drwxr-xr-x - root root 3 2017-04-21 18:57 /tables drwxrwxrwx - mapr mapr 0 2017-09-02 08:22 /tmp drwxr-xr-x - mapr mapr 9 2017-09-02 10:59 /user drwxr-xr-x - mapr mapr 1 2017-04-21 10:44 /var
In the above screenshot I am copying local file D:/Thrash/employee.txt into HDFS directory and renaming to emp_hdfs.txt.
Now lets execute it. Oh Oh. It failed.
[ERROR]: org.apache.hadoop.util.Shell - Failed to locate the winutils binary in the hadoop binary path
You remember in the previous post I had mentioned we will configure something during execution? That what we will do now. Go to run job properties and click on advance setting. Give the path to winutils location.
Now we are good to go. After execution you can see the file emp_hdfs.txt in HDFS.
[mapr@maprdemo ~]$ hadoop fs -ls /user/bhabani Found 3 items drwxr-xr-x - mapr mapr 0 2017-09-02 13:33 /user/bhabani/destination -rwxr-xr-x 1 root root 17256960 2017-09-02 23:07 /user/bhabani/emp_hdfs.txt -rwxrwxrwx 1 mapr mapr 53 2017-09-01 23:28 /user/bhabani/testfile.txt [mapr@maprdemo ~]$
Now lets do something Hive. We will use tHiveRow to create a table and load some data into it. Drag tHiveRow two times and connect them. Change the property type as Repository and select the hive connection we had created before. It will fill the connection parameters automatically.
Now lets add some queries in to the component. I have already one table called orders_subset having some rows. We will load this data to the newly created table.
Now execute the job and verify if data is loaded or not.
Thats all for today. In my next post we will load data into tables having storage type as parquet. As always if you face any issues do let me know. I will be happy to assist you.