Connecting MapR Cluster usingTalend on Windows box

Connecting MapR Cluster from Talend:

Talend Open Studio for Data Integration Version: 6.3.1
Java: 1.8
OS: Windows 8

This article will not cover the installation and setup of Talend Open Studio. The assumption is talend already installed and is working correctly. For details on how to install and configure Talend Open Studio see this post.

In this post we will establish a connection to mapr cluster using windows client. To demonstrate this I have downloaded Mapr sandbox and TOSBD 6.3. Since these virtual machine are more hungry for RAM, I have installed the sand box and Talend in two different laptop. However if you have sufficient RAM available you can setup both in one machine. I am not going to cover how to setup the appliance in virtual box as this is very easy to setup and configure. However I would like mention one point that is the network settings. Since both of my laptops were connected via wireless router I had taken Bridge Adapter so that both machine can communicate with each other bidirectionally. Now the only problem is that every time you restart your sandbox the IP gets changed and its very irritating to keep changing your Talend connection parameters. So I had configured static ip inside centos as follows.

[root@maprdemo networkscripts]# pwd
[root@maprdemo networkscripts]# cat ifcfgeth0
[root@maprdemo networkscripts]# cat /etc/resolv.conf

Once done, you can restart the network to see the new ip.

service network restart

So when you start your MapR cluster make sure all of your services are up and running. Most important is CLDB, If its down then no need to read further. Diagnose and fix it. If during the start-up you see a message like ” mapr services failed to start” then login to the system. Manually stop warden service and then zookeeper. Here are the steps to bring the services up.

sudo /sbin/service maprwarden stop
sudo /sbin/service maprzookeeper stop
sudo /sbin/service maprzookeeper start
sudo /sbin/service maprwarden start

As we can see below the status is green and all services are running fine.

Now to connect Mapr cluster from windows we need download the client packages. Go to this link and download Extract it to C:\opt\mapr

Once this is done, we have to do few modification in the conf files. Open from mapr-clusters.conf from  C:\opt\mapr\conf and update the cluster and cldb node details as below. maprdemo

Next we need to download winutils.exe.  Put this file in C:\winutil\bin

Next we need configure host file. Go to C:\Windows\System32\drivers\etc and update hosts file with below detaiils. Dont forget to update the ip if you have a different one. maprdemo

If you will not update this, in Talend you will see below error.

[ERROR]: com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils Can not find nonlocal IP based on provided hostname: maprdemo
[ERROR]: com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils Can not find IP for host: maprdemo maprdemo
at Method)
at$2.lookupAllHostAddr(Unknown Source)
at Source)
hostname maprdemo, for cluster:
20170901 20:53:38,1877 Some error on socket 1232
20170901 20:53:39,2503 Some error on socket 1232
20170901 20:53:40,3131 Some error on socket 1252
20170901 20:53:40,3131 ERROR Cidcache fs/client/fileclient/cc/ Thread: 12588 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!
20170901 20:53:45,3271 ERROR Client fs/client/fileclient/cc/ Thread: 12588 Failed to initialize client for cluster, error Cannot send after transport endpoint shutdown(108)

Ok. We are left with one more thing and that is environment variable. Set it up as per below screenshot.

Thats it. Now start talend studio. There are few settings that we need to do here as well. Some of the settings we will do during the execution. In the menu bar go to Windows>Preference>Talend and make sure the java compiler is pointing to jdk path path not jre.

Now create a hadoop cluster under metadata and update the parameters as below. Once done click check service.

Congratulation!!! You are now ready to make your hands dirty with MapR Cluster.

Some important links on Mapr:

Cluster console

Job History Console

MCS Console

Hue Console:

In my next post we will try to load data to simple hive table, hive table with parquet storage. If you face any issues while going through all the steps do let me know. I will be happy to assist you. One honest opinion from myside is that MapR community is not that much strong like we used to have in Oracle forums. So its bit painful to debug the issue.

See you in my next post.


About the author

Bhabani( - Currently Bhabani is working as Sr Development Engineer at Harman International. He has good expertise on Oracle, Oracle Data Integrator, Pervasive Data Integrator, MSBI, Talend and Java. He is also contributing in ODI-OTN forum for last 5 years. He is from India. If you want to reach him then please visit contact us page. If you have any doubts or concerns on the above article, please put your question here. Dw Team will try to respond it as soon as possible. Also dont forget to provide your comments / suggestions / feedback for further improvement. Thanks for your time.

Similar Posts

Leave a reply


Are you a human? *