Connecting MapR Cluster usingTalend on Windows box

Connecting MapR Cluster from Talend:

Talend Open Studio for Data Integration Version: 6.3.1
Java: 1.8
OS: Windows 8

This article will not cover the installation and setup of Talend Open Studio. The assumption is talend already installed and is working correctly. For details on how to install and configure Talend Open Studio see this post.

In this post we will establish a connection to mapr cluster using windows client. To demonstrate this I have downloaded Mapr sandbox and TOSBD 6.3. Since these virtual machine are more hungry for RAM, I have installed the sand box and Talend in two different laptop. However if you have sufficient RAM available you can setup both in one machine. I am not going to cover how to setup the appliance in virtual box as this is very easy to setup and configure. However I would like mention one point that is the network settings. Since both of my laptops were connected via wireless router I had taken Bridge Adapter so that both machine can communicate with each other bidirectionally. Now the only problem is that every time you restart your sandbox the IP gets changed and its very irritating to keep changing your Talend connection parameters. So I had configured static ip inside centos as follows.

[root@maprdemo network-scripts]# pwd
[root@maprdemo network-scripts]# cat ifcfg-eth0
[root@maprdemo network-scripts]# cat /etc/resolv.conf

Once done, you can restart the network to see the new ip.

service network restart

So when you start your MapR cluster make sure all of your services are up and running. Most important is CLDB, If its down then no need to read further. Diagnose and fix it. If during the start-up you see a message like ” mapr services failed to start” then login to the system. Manually stop warden service and then zookeeper. Here are the steps to bring the services up.

sudo /sbin/service mapr-warden stop
sudo /sbin/service mapr-zookeeper stop
sudo /sbin/service mapr-zookeeper start
sudo /sbin/service mapr-warden start

As we can see below the status is green and all services are running fine.

Now to connect Mapr cluster from windows we need download the client packages. Go to this link and download Extract it to C:\opt\mapr

Once this is done, we have to do few modification in the conf files. Open from mapr-clusters.conf from  C:\opt\mapr\conf and update the cluster and cldb node details as below. maprdemo

Next we need to download winutils.exe.  Put this file in C:\winutil\bin

Next we need configure host file. Go to C:\Windows\System32\drivers\etc and update hosts file with below detaiils. Dont forget to update the ip if you have a different one. maprdemo

If you will not update this, in Talend you will see below error.

[ERROR]: com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils - Can not find non-local IP based on provided hostname: maprdemo
[ERROR]: com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils - Can not find IP for host: maprdemo maprdemo
at Method)
at$2.lookupAllHostAddr(Unknown Source)
at Source)
hostname maprdemo, for cluster:
2017-09-01 20:53:38,1877 Some error on socket 1232
2017-09-01 20:53:39,2503 Some error on socket 1232
2017-09-01 20:53:40,3131 Some error on socket 1252
2017-09-01 20:53:40,3131 ERROR Cidcache fs/client/fileclient/cc/ Thread: 12588 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!
2017-09-01 20:53:45,3271 ERROR Client fs/client/fileclient/cc/ Thread: 12588 Failed to initialize client for cluster, error Cannot send after transport endpoint shutdown(108)

Ok. We are left with one more thing and that is environment variable. Set it up as per below screenshot.

Thats it. Now start talend studio. There are few settings that we need to do here as well. Some of the settings we will do during the execution. In the menu bar go to Windows>Preference>Talend and make sure the java compiler is pointing to jdk path path not jre.

Now create a hadoop cluster under metadata and update the parameters as below. Once done click check service.

Congratulation!!! You are now ready to make your hands dirty with MapR Cluster.

Some important links on Mapr:

Cluster console

Job History Console

MCS Console

Hue Console:

In my next post we will try to load data to simple hive table, hive table with parquet storage. If you face any issues while going through all the steps do let me know. I will be happy to assist you. One honest opinion from myside is that MapR community is not that much strong like we used to have in Oracle forums. So its bit painful to debug the issue.

See you in my next post.

About Bhabani 86 Articles
Bhabani has 12 plus years of experience in Data warehousing and Analytics projects that has span across multiple domains like Travel, Banking and Financial, Betting and Gaming Industries. Solution areas he focuses on designing the data warehouse and integrating it with cloud platforms like AWS or GCP. He is also a Elite level contributor at OTN forum more than 9 years. He loves to do experiment and POC on different integration tools and services. Some of his favorite skills are Redshift, Big Query, Python, Apache Airflow, Kafka, HDFS, Map Reduce ,HIVE, Habse, Sqoop, Drill, Impala.

1 Comment

Comments are closed.