Connecting MapR Cluster from Talend:
Talend Open Studio for Data Integration Version: 6.3.1
OS: Windows 8
This article will not cover the installation and setup of Talend Open Studio. The assumption is talend already installed and is working correctly. For details on how to install and configure Talend Open Studio see this post.
In this post we will establish a connection to mapr cluster using windows client. To demonstrate this I have downloaded Mapr sandbox and TOSBD 6.3. Since these virtual machine are more hungry for RAM, I have installed the sand box and Talend in two different laptop. However if you have sufficient RAM available you can setup both in one machine. I am not going to cover how to setup the appliance in virtual box as this is very easy to setup and configure. However I would like mention one point that is the network settings. Since both of my laptops were connected via wireless router I had taken Bridge Adapter so that both machine can communicate with each other bidirectionally. Now the only problem is that every time you restart your sandbox the IP gets changed and its very irritating to keep changing your Talend connection parameters. So I had configured static ip inside centos as follows.
[root@maprdemo network-scripts]# pwd /etc/sysconfig/network-scripts [root@maprdemo network-scripts]# cat ifcfg-eth0 DEVICE="eth0" BOOTPROTO="static" DHCP_HOSTNAME="maprdemo.local" IPV6INIT="yes" NM_CONTROLLED="no" ONBOOT="yes" TYPE="Ethernet" UUID="d9fb916d-32cd-47e2-9c50-6f2e3efd3c3d" IPADDR=192.168.2.119 NETMASK=255.255.255.0 GATEWAY=192.168.2.1 [root@maprdemo network-scripts]# cat /etc/resolv.conf nameserver 18.104.22.168
Once done, you can restart the network to see the new ip.
service network restart
So when you start your MapR cluster make sure all of your services are up and running. Most important is CLDB, If its down then no need to read further. Diagnose and fix it. If during the start-up you see a message like ” mapr services failed to start” then login to the system. Manually stop warden service and then zookeeper. Here are the steps to bring the services up.
sudo /sbin/service mapr-warden stop sudo /sbin/service mapr-zookeeper stop sudo /sbin/service mapr-zookeeper start sudo /sbin/service mapr-warden start jps
As we can see below the status is green and all services are running fine.
Now to connect Mapr cluster from windows we need download the client packages. Go to this link and download mapr-client-22.214.171.124122GA-1.amd64.zip. Extract it to C:\opt\mapr
Once this is done, we have to do few modification in the conf files. Open from mapr-clusters.conf from C:\opt\mapr\conf and update the cluster and cldb node details as below.
demo.mapr.com 192.168.2.119:7222 maprdemo
Next we need to download winutils.exe. Put this file in C:\winutil\bin
Next we need configure host file. Go to C:\Windows\System32\drivers\etc and update hosts file with below detaiils. Dont forget to update the ip if you have a different one.
192.168.2.119 demo.mapr.com maprdemo
If you will not update this, in Talend you will see below error.
[ERROR]: com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils - Can not find non-local IP based on provided hostname: maprdemo [ERROR]: com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils - Can not find IP for host: maprdemo java.net.UnknownHostException: maprdemo at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(Unknown Source) at java.net.InetAddress.getAddressesFromNameService(Unknown Source) hostname maprdemo, for cluster: demo.mapr.com 2017-09-01 20:53:38,1877 Some error on socket 1232 2017-09-01 20:53:39,2503 Some error on socket 1232 2017-09-01 20:53:40,3131 Some error on socket 1252 2017-09-01 20:53:40,3131 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1611 Thread: 12588 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds! 2017-09-01 20:53:45,3271 ERROR Client fs/client/fileclient/cc/client.cc:1104 Thread: 12588 Failed to initialize client for cluster demo.mapr.com, error Cannot send after transport endpoint shutdown(108)
Ok. We are left with one more thing and that is environment variable. Set it up as per below screenshot.
Thats it. Now start talend studio. There are few settings that we need to do here as well. Some of the settings we will do during the execution. In the menu bar go to Windows>Preference>Talend and make sure the java compiler is pointing to jdk path path not jre.
Now create a hadoop cluster under metadata and update the parameters as below. Once done click check service.
Congratulation!!! You are now ready to make your hands dirty with MapR Cluster.
Some important links on Mapr:
Job History Console
In my next post we will try to load data to simple hive table, hive table with parquet storage. If you face any issues while going through all the steps do let me know. I will be happy to assist you. One honest opinion from myside is that MapR community is not that much strong like we used to have in Oracle forums. So its bit painful to debug the issue.
See you in my next post.