This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

In this tutorial, you'll create a Hive table, load data from a tab-delimited text file, and run a couple of basic queries against the table.

Icon

If you are using HiveServer2, you will use the BeeLine CLI instead of the Hive shell, as shown below. For details on setting up HiveServer2 and starting BeeLine, see Connecting to HiveServer2.

First, make sure you have downloaded the sample table. Select Tools > Attachments and right-click on sample-table.txt, select Save Link As... from the pop-up menu, select a directory to save to, then click OK. If you're working on the MapR Virtual Machine, we'll be loading the file from the MapR Virtual Machine's local file system (not the cluster storage layer), so save the file in the MapR Home directory (for example, /home/mapr).

Take a look at the source data

First, take a look at the contents of the file using the terminal:

  1. Make sure you are in the Home directory where you saved sample-table.txt (type cd ~ if you are not sure).
  2. Type cat sample-table.txt to display the following output.

Notice that the file consists of only three lines, each of which contains a row of data fields separated by the TAB character. The data in the file represents a web log.

Create a table in Hive and load the source data:

  1. Type the following command to start the Hive shell, using tab-completion to expand the <version>:

  2. At the hive> prompt, type the following command to create the table:

  3. Type the following command to load the data from sample-table.txt into the table:

Run basic queries against the table:

  • Try the simplest query, one that displays all the data in the table:

    This query would be inadvisable with a large table, but with the small sample table it returns very quickly.

  • Try a simple SELECT to extract only data that matches a desired string:

    This query launches a MapReduce job to filter the data.

  • No labels