Getting started with MapR Sandbox and Practicing Basic Commands

Getting started with MapR Sandbox and Practicing Basic Commands:

MapR Sandbox for Hadoop 5.2.1
OS: Windows 8

This article will walk through the following steps.

1. Download the MapR Sandbox and Virtual Box
2. Import appliance and start the VM
3. Verify the MapR Services
4. Overview of MCS
5. Overview of HUE editor
6. Create Hive, Hbase and MapR table and perform DML operation

What is this VM: The MapR Sandbox for Hadoop is a fully-functional single-node cluster that gently introduces business analysts, current and aspiring Hadoop developers, and administrators (database, system, and Hadoop) to the big data promises of Hadoop and its ecosystem.

To download this sandbox click on this link. Also you need a VMWare player or virtual box to run this appliance. Click on this link to download the binaries or source code as per your operating system.

Prerequisite: At least 20 GB free hard disk space, at least 4 physical cores, and 8 GB of RAM is available. Performance increases with more RAM and free hard disk space.

Now import the appliance in virtual box and click start . During startup monitor the MapR services and make sure they are running successfully. Sometimes you might see some error message like ” Waiting for MapR services to come up. Error  MapR services failed to start in 2 minute“. If this is the case then you might not have allocated enough RAM for the system. This was the reason I switched to a 16 GB laptop.

You can see the memory usage on my 8GB laptop.

Okay. After a successful boot you should see below message.

Give username and password as mapr

Run jps to see the services running at the moment.

We can now check the root directories and other installation locations.

Run hostname command to see the hostname and ip address.

Now open the browser in your localhost (not inside the VM) and go to below url. Immediately you can see GUI for HUE and MCS.

Lets take a look on the HUE editor. You can see some error message however lets ignore this. We can fix it at a later point of time.

Click on query editor and select hive. If you dont like hive shell then this is the one you will fall in love.

Lets a take look on the Hbase Browser. Here you can create , insert, update, delete rows present inside a table. Notice how you can add column families, no of versions to retain etc.

I had created a table called employee and has two column families named as personal and professional . Each family has one attribute called emailid.

Sounds good? Lets dive a bit deeper.

If you read the documentation it is written MapR distribution provides a full Hadoop stack that includes the MapR File System (MapR-FS), the MapR-DB NoSQL database management system, MapR Streams, the MapR Control System (MCS) user interface, and a full family of Hadoop ecosystem projects.

This is where I was really excited. Yes its MapR DB. How does it work? Is it same like other No Sql databases? Well there were lot of question in my mind. Thanks to the documentation. Though I didnt find detailed information about internal structure but somehow I found couple of YouTube videos where MapR CoFounder M.C. Srivas has explained it very well. Other than this what I observed is that there are very less information on internet. So you have to read each and every piece of information from whatever sources and join the dots at the end to figure out how things are happening.

So today lets try to discuss on MapR DB, HBASE and HIVE. To query MapR DB we have different options however we will try to use MapRCLI (MapR command line interface), MapR DBShell, HBASE and HIVE.

If you want to create a table in Hive what should you write?

Create table employee (no int, name string)

Thats it. Simple and straight forward. What if I want to create a hive a table that points to Hbase table? Here how it goes.

CREATE TABLE hbase_table_using_hive(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("" = "habasetable");

And what if I want to create a hive table that points to MaprDB?

CREATE TABLE mapr_table_using_hive(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("" = "/user/bhabani/testtable");

So lets execute them in Hue Editor.

Now both tables should have been reflected in hive .

Lets go back to Hbase browser and verify if the table “habasetable” is avaible or not”. Yes it is.

What about the other one. You can see there is a table called testtable and it is a mapr table.

We can verify the same in hbase shell as well. The first one points to habse table where as the second one points mapr db table.

Lets verify if we can query these tables with the help of HUE editor.

And here we will query these tables from hbase shell.

So we saw how to use Hue editor and Hbase Shell. Lets try to use MapR DBShell. Notice I have done most of the DDL and DML operation directly on MapR DB tables and the values that I have given is in JSON format.

The last one is MapRCLI. Using maprcli you can aslo create, edit MaprDB tables. Remember these tools are incredibly useful for administrators who operate the Hadoop cluster, as well as for developers trying to debug Hadoop applications. Refer this link to know all these commands.

So what we noticed here is that there are many ways to work with MapR DB tables but always remember Hive and Habse has to pass though couple of additional layers while reaching to MapR FS. You can see it very clearly in MapR DB architecture diagram and how does it help in imporving the performance.

The last section of this post is about MCS. Though I will not go in detail but I will touch upon some important points. In another post I will try to cover all the navigation and usages. So lets login to MCS with the default password and click ok.

This is how the dashboard looks like.

This screenshot shows the volumes available.

   One good thing is we can manage all the services from this GUI. Just click Nodes under cluster. You can see the name of node, health, physical Ip address etc. When you click on manage service you will have option to start and stop them.

So thats all for today. Let me know if you like it or not. Also if you get stuck anywhere do not hesitate to leave a comment. I will be right behind you.



About Bhabani 86 Articles
Bhabani has 12 plus years of experience in Data warehousing and Analytics projects that has span across multiple domains like Travel, Banking and Financial, Betting and Gaming Industries. Solution areas he focuses on designing the data warehouse and integrating it with cloud platforms like AWS or GCP. He is also a Elite level contributor at OTN forum more than 9 years. He loves to do experiment and POC on different integration tools and services. Some of his favorite skills are Redshift, Big Query, Python, Apache Airflow, Kafka, HDFS, Map Reduce ,HIVE, Habse, Sqoop, Drill, Impala.