This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Image Removed

Hue is the open source UI that interacts with Apache Hadoop and its ecosystem components.

Before you can run Hue applications, you need to:

  1. For clusters running MRv1, establish communication between Hue and JobTrackers.
  2. Edit configuration files.
  3. Set up the Oozie sharelib and examples.
  4. Restart services.

You may also want to disable applications or change the file size restriction for the File Browser.

Info

These instructions assume that you have already installed Hue (see Install Hue).

...

If your cluster runs MRv1, each JobTracker node requires the Hue plug-in so that Hue can communicate with all JobTrackers.

To copy the Hue plug-in (which is a .jar file) to your MapReduce lib directory on all the nodes running JobTracker, enter:

Code Block
cp /opt/mapr/hue/hue-3.7.0/desktop/libs/hadoop/java-lib/hue-plugins-*.jar /opt/mapr/hadoop/hadoop-0.20.2/lib/
Note

If JobTracker is running on a different host (not localhost), use the scp command to copy the hue-plugins-*.jar file to the JobTracker host.

In order for these changes to take effect, all JobTrackers must be restarted. Make all configuration file changes first, then restart each JobTracker as described in Restarting Services and Verifying Status.

...

Based on your requirements, modify the following configuration files:

Note

In the instructions that follow, the <default_user> refers to the server_user specified in the desktop section of the hue.ini file.

...

After you install Hue, perform the following steps to configure Hue:

  1. Complete the general configuration steps. This includes integrating Hue with JobTracker, ResourceManager, and HttpFS.
  2. Perform the steps to integrate each additional component that you want to use with Hue.
    • Hive
    • HBase or MapR-DB
    • Impala
    • Oozie
    • Spark
    • Sqoop2

You may also want to do the following:

  • Configure Security
  • Configure DB Query
  • Configure Hue Interface Authentication

 

Info
The hue.ini file is the main configuration file for running Hue on a cluster. This file is located at /opt/mapr/hue/hue

...

-<version>/desktop/conf/hue.ini.

...

Here are the changes you need to make to this file, organized by section:

Info
When you update the value of a property in the hue.ini, remove any hashes (##) that appear directly before the property name.  

General Configuration for Core Desktop Feature

If you run YARN applications, such as MRv2, set default_jobtracker_host to the ResourceManager host and port in the desktop section of the hue.ini.

The following are example configurations for the desktop section of the hue.ini:

Code Block
titleYARN Example
languagetext
[desktop]

# Webserver runs as this user.
  server_user=mapr
  server_group=mapr

# This should be the Hue admin and proxy user
  default_user=mapr

# This should be the hadoop cluster admin in /opt/mapr/conf/daemon.conf
  default_hdfs_superuser=mapr
 
# Set the default JobTracker host to maprfs to enable HA for JobTracker.
# If there is a standby JobTracker, it will be found automatically.
# In the event of failover, Hue will submit queries to the standby JobTracker.
  default_jobtracker_host=localhost:8032
Code Block
titleMRv1 Example
languagetext
[desktop]

# Webserver runs as this user.
  server_user=mapr
  server_group=mapr

# This should be the Hue admin and proxy user
  default_user=mapr

# This should be the hadoop cluster admin in /opt/mapr/conf/daemon.conf
  default_hdfs_superuser=mapr
 
# Set the default JobTracker host to maprfs to enable HA for JobTracker.
# If there is a standby JobTracker, it will be found automatically.
# In the event of failover, Hue will submit queries to the standby JobTracker.
  default_jobtracker_host=maprfs:///

Settings to Configure your Hadoop Cluster

In the hadoop section of the hue.ini file, complete the following steps:

  1. Set the webhdfs_url to be the node that runs httpfs.
  2. If you run MRv1 jobs, configure the following:
    • jobtracker_port= <port where the JobTracker IPC listens, default is 9001)
    • submit_to=True
    • In the ha section, configure the jobtracker_host to be the host on which you are are running the failover JobTracker.
  3. If you run YARN applications, such as MRv2, complete the following steps in the yarn_clusters section:
    • Specify the hostname and port number for the ResourceManager.

    • Set the submit_to value to True.

  4. For versions prior to Hue 3.7-1505: If you run YARN applications, such as MRv2, you must also complete the following steps in the yarn_clusters section:
    • Set the security_enabled value to False

    • Supply the URL for the ResourceManager API.

    • Supply the URL for the proxy API.

    • Supply the URL for the HistoryServer API.

    Note: As of Hue 3.7-1505, Hue automatically determines these values.
  5. If you run YARN applications, such as MRv2, complete the following step in the mapred_clusters section:

    •  Set the submit_to value to False.

The changes are summarized in the following example hue.ini files: 

Code Block
titleYARN Example
languagetext
[hadoop]

# Use WebHdfs/HttpFs as the communication mechanism.
# This should be the web service root URL.
# The ip_address corresponds to the node running httpfs.
webhdfs_url=http://<ip_address>:14000/webhdfs/v1
 
[[yarn_clusters]]
  [[[default]]]
     # Enter the host on which you are running the ResourceManager
       resourcemanager_host=localhost

      # The port where the ResourceManager IPC listens on
       resourcemanager_port=8032

      # Whether to submit jobs to this cluster
      submit_to=True

      # Change this if your YARN cluster is secured
      security_enabled=${security_enabled}
      
      # Security mechanism of authentication none/GSSAPI/MAPR-SECURITY
      mechanism=${mechanism}

      # URL of the ResourceManager API
      ##resourcemanager_api_url=http://localhost:8088

      # URL of the ProxyServer API
      ##proxy_api_url=http://localhost:8088
      
      # URL of the HistoryServer API
      ##history_server_api_url=http://localhost:19888
 
[[mapred_clusters]]
    [[[default]]]
     
 # Whether to submit jobs to this cluster
      submit_to=False 
Code Block
titleMRv1 example
languagetext
[hadoop]

# Use WebHdfs/HttpFs as the communication mechanism.
# This should be the web service root URL.
# The ip_address corresponds to the node running httpfs.
webhdfs_url=http://<ip_address>:14000/webhdfs/v1

jobtracker_host=<ip_address_of_active_JobTracker_node>
# The port where the JobTracker IPC listens on
jobtracker_port=9001
submit_to=True
 
 [[[ha]]]
      # Enter the host on which you are running the failover JobTracker
      jobtracker_host=localhost-ha

Settings to Configure liboozie

Code Block
languagetext
[liboozie]
 
# The URL (host IP address) where the Oozie service is running. This is required in order for
# users to submit jobs.
oozie_url=http://<ip_address>:11000/oozie

Settings to Configure Beeswax with Hive

In the beeswax section of the hue.ini file, make the following changes:

Code Block
languagetext
[beeswax]
 
# Host where Hive server Thrift daemon is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=<FQDN of Hive Server>

# Port that HiveServer2 Thrift server runs on.
  hive_server_port=10000
Info
As of Hue 3.7-1505, you can view logs for Hive (version 1.0 and above) by setting use_get_log_api to true in the [beeswax] section.

...

This section contains the following instructions for Hive 0.13: 

 

...

locationtop

Configuring Hive Data and Metadata Directories

When Hue and Hive are used together, they usually share Hive's metadata and data directories. The locations of the shared directories are specified by the following properties in the hive-site.xml file:

  • hive.metastore.uris (the directory where metastore_db is located)
  • hive.metastore.warehouse.dir (the directory where the default database for the warehouse is located)

 If you decide to create separate directories for Hue and Hive, see the directions in Using Separate Data and Metadata Directories for Hue

To configure shared Hive directories: 

Info
The following steps work for both the Derby database and the MySQL database. 
  1. Change the hive.metastore.uris property as shown:

    Code Block
    languagetext
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://localhost:9083</value>
      <description> URI where clients contact Hive metastore server </description>
    </property>
    Note

    The hive.metastore.warehouse.dir property can keep its default value and does not need to be changed.

  2. Enable Hue impersonation by setting the following property to true.

    Code Block
    languagetext
    <property>
      <name>hive.metastore.execute.setugi</name>
      <value>true</value>
      <description> Set this property to enable Hive Metastore service impersonation in unsecure mode.
       In unsecure mode, setting this property to true causes the metastore to execute DFS operations
       using the client's reported user and group permissions. Note that this property must be set on
       BOTH the client and server sides. </description>
    </property>
  3. Set the location of the sharelib.

    Code Block
    <property>
      <name>oozie.service.WorkflowAppService.system.libpath</name>
      <value>/oozie/share/lib</value>
    </property>

...

If you want to store Hue data and metadata in separate directories from Hive data and metadata, follow these steps:

  1. Copy hive-site.xml to a new location. (The original hive-site.xml file remains in the previous location for use by Hive.)
  2. Edit hue.ini and change the hive_conf_dir property so it points to the new location for hive-site.xml.
  3. Change the hive.metastore.warehouse.dir property in the new hive-site.xml file so it points to the directory where Hue data will be located.
  4. Change the hive.metastore.uris property so it points to the directory for Hue's metastore_db.
  5. Set the hive.metastore.execute.setugi property to true, as shown in step 2 above.

...

To enable the Hive Metastore service to share the embedded Derby database, add the following property blocks to the hive-site.xml file on the node running hiveserver2 to point to the location of the Derby metastore:

Code Block
languagetext
<property>
  <name>javax.jdo.option.ConnectionURL</name>
 
  <value>jdbc:derby:;databaseName=/<local dir>/metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

...

To point to a MySQL database, add the following property blocks to the hive-site.xml file:

Code Block
languagetext
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://<ip_address>:3306/hive_11?createDatabaseIfNotExist=true</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value><UserName></value>
  <description>Substitute the actual username</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value><Password></value>
  <description>Substitute the actual password</description>
</property>

...

For all nodes running JobTracker, provide the JobTracker Thrift port in mapred-site.xml as shown:

Code Block
languagetext
<property>
  <name>jobtracker.thrift.address</name>
  <value>0.0.0.0:9290</value>
</property>
 
<property>
  <name>mapred.jobtracker.plugins</name>
  <value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
  <description>Comma-separated list of jobtracker plug-ins to be activated.</description>
</property>

...

To enable user impersonation in Hue, add the following lines to the /opt/mapr/hadoop/hadoop-<version>/conf/core-site.xml file for all nodes running JobTracker or ResourceManager:

Code Block
<property>
  <name>hadoop.proxyuser.<default_user>.hosts</name>
  <value>*</value>
</property>
 
<property>
  <name>hadoop.proxyuser.<default_user>.groups</name>
  <value>*</value>
</property>

...

To enable user impersonation in Oozie through Hue, add the following lines to the oozie-site.xml file:

Code Block
<property>
  <name>oozie.service.ProxyUserService.proxyuser.<default_user>.hosts</name>
  <value>*</value>
</property>
 
<property>
  <name>oozie.service.ProxyUserService.proxyuser.<default_user>.groups</name>
  <value>*</value>
</property>

...

Configure Hue as a proxy user for all other users and groups, which allows Hue to submit requests on behalf of any other user. To do this, add the following proxy user settings in the configuration block:

Code Block
<!-- Hue HttpFS proxy user setting -->
<configuration>
  <property>
    <name>httpfs.proxyuser.<default_user>.hosts</name>
    <value>*</value>
  </property>
 
  <property>
    <name>httpfs.proxyuser.<default_user>.groups</name>
    <value>*</value>
  </property>
</configuration>

...

To set up the Oozie sharelib and place examples in the /examples directory, follow these steps:

  1. Untar oozie-sharelib*.tar.gz and put the contents under oozie/share/lib:

    Code Block
    cd /opt/mapr/oozie/oozie-* 
    tar xvzf oozie-sharelib*.tar.gz
    hadoop fs -mkdir /oozie/share/lib
    hadoop fs -put share/* /oozie/share/lib
  2. (Optional) Put the Oozie examples in MapR-FS under /oozie/examples:

    Code Block
    cd /opt/mapr/oozie/oozie-*
    tar xvzf oozie-examples.tar.gz
    hadoop fs -mkdir /oozie/examples
    hadoop fs -put examples/* /oozie/examples
  3. Change permissions for Oozie as shown:

    Code Block
    languagetext
    hadoop fs -chmod -R 777 /oozie

    The file permissions for the contents of the /oozie directory are shown below:

     

    Code Block
    languagetext
    # hadoop fs -ls  /oozie/
    Found 5 items
    drwxrwxrwt   - mapr mapr         12 2013-11-01 12:02 /oozie/deployments
    drwxrwxrwx   - mapr mapr          3 2013-10-25 12:41 /oozie/examples
    drwxrwxrwt   - mapr mapr          1 2013-10-25 13:05 /oozie/pig
    drwxrwxrwx   - mapr mapr          1 2013-10-25 12:00 /oozie/share
    drwxrwxrwt   - mapr mapr         48 2013-10-31 22:37 /oozie/workspaces
    # hadoop fs -ls /oozie/workspaces/
    drwxrwxrwx   - mapr    mapr             1 2013-10-27 17:31 /oozie/workspaces/bundles
    drwxrwxrwx   - mapr    mapr             1 2013-10-27 17:31 /oozie/workspaces/coordinators
    drwxrwxrwx   - mapr    mapr             7 2013-10-25 13:05 /oozie/workspaces/data
    drwxrwxrwx   - mapr    mapr             1 2013-10-27 17:31 /oozie/workspaces/lib
    drwxrwxrwx   - mapr    mapr            13 2013-10-27 17:17 /oozie/workspaces/managed
    drwxrwxrwx   - mapr    mapr            10 2013-10-27 17:17 /oozie/workspaces/unmanaged
    # hadoop fs -ls /oozie/workspaces/lib/
    Found 1 items
    -rwxrwxrwx   3 mapr mapr     142551 2013-10-27 17:31 /oozie/workspaces/lib/hadoop-examples.jar

...

If you want to disable an application (such as Impala), follow these steps:

  1. In the [desktop] section of the hue.ini file, uncomment the # app_blacklist= statement and insert the name of the app you want to disable (impala in this example).

    Code Block
    languagetext
    # Comma-separated list of apps not to load at server startup.
    # Note that rdbms is the name used for dbquery.
    app_blacklist=spark,zookeeper,search,impala,sqoop,rdbms
    Info
    Do not remove search from the app_blacklist. The Hue UI will not work if the search application is enabled.
  2. Once all changes are made, restart Hue so the changes will take effect.

Note
iconfalse

 You can re-enable a blacklisted application at any time, and then restart Hue.

...

As of Hue 3.6.0-1504, the Hue File Browser will not open files that are 1.0 GB or greater.

If you want to change file size restriction, follow these steps:

  1. In the [[hdfs_clusters]] section of the hue.ini, edit the value of the file_size property. 

    Code Block
    # File size restriction for viewing file (float)
    # '1.0' - default 1 GB file size restriction
    # '0' - no file size restrictions
    # >0  - set file size restriction in gigabytes, ex. 0.5, 1.0, 1.2...
    file_size=<value>
  2. Restart Hue. 

...

Now that you have made changes to the configuration files, each of the following services must be restarted for these changes to take effect:

...

To restart each JobTracker, enter the following command. You can list multiple IP addresses as a space-separated list.

Code Block
maprcli node services -jobtracker restart -nodes <ip_addresses>

 

Confirm that the plug-ins are running correctly by issuing the tail command after you restart JobTracker. Sample output is shown below:

 

$ tail --lines=500 /opt/mapr/hadoop/hadoop*/logs/*jobtracker*.log|grep ThriftPlugin
2013-09-26 15:02:39,337 INFO org.apache.hadoop.thriftfs.ThriftPluginServer: Starting Thrift server
2013-09-26 15:02:39,419 INFO org.apache.hadoop.thriftfs.ThriftPluginServer: Thrift server listening on 0.0.0.0:9290

 

To verify that JobTracker started and can connect to the thrift plugin port, enter:

 

Code Block
languagetext
lsof -i:9290

 

The output from this command should look similar to this:

 

Code Block
languagetext
COMMAND   PID   USER   FD    TYPE   DEVICE SIZE/OFF NODE NAME
java    10308   mapr   111u  IPv4   18538352    0t0  TCP *:9290 (LISTEN)

 

You can also check JobTracker logs to verify that JobTracker started. An example log location is shown here:
Code Block
languagetext
/opt/mapr/hadoop/hadoop-0.20.2/logs/hadoop-mapr-jobtracker-*.log

...

To restart Oozie, first stop Oozie then start it:

Code Block
maprcli node services -name oozie -action stop -nodes <ip_address>
maprcli node services -name oozie -action start -nodes <ip_address>
To verify that the Oozie server started, enter:

 

Code Block
languagetext
lsof -i:11000

 

The output from this command should look similar to this:

 

Code Block
languagetext
COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
java    16644 mapr   35u  IPv6   69926776      0t0  TCP *:irisa (LISTEN)

You can also check Oozie logs to verify that Oozie started. The log is found here:

Code Block
languagetext
/opt/mapr/oozie/oozie-3.3.2/logs/oozie.log

...

The instructions for starting Hue vary, depending on whether they were installed on a cluster node or a non-cluster (edge) node. Note that installing on a cluster node is the preferred method.

Starting Hue Webserver on a Cluster Node

If Hue is installed on a cluster node (the common use case and recommended practice), start the Hue webserver by entering:
Code Block
maprcli node services -name hue -action start -nodes <ip_address>

Starting Hue Webserver on an Edge Node

If Hue is installed on an edge node (not recommended), start the Hue webserver by entering:

 

Code Block
languagetext
/opt/mapr/hue/hue-3.5.0/bin/hue.sh runcpserver start

 

Verifying that the Hue Webserver Started

To verify that the Hue webserver started, enter:

 

Code Block
languagetext
lsof -i:8888

 

The output from this command should look similar to this:
Code Block
languagetext
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
python2.6 27688 mapr    3u  IPv4 69955314      0t0  TCP *:ddi-tcp-1 (LISTEN)
python2.6 27691 mapr    3u  IPv4 69955314      0t0  TCP *:ddi-tcp-1 (LISTEN)
You can also check Hue webserver logs to verify that Hue webserver started. If the Hue webserver was installed on a cluster node, the log is found here:
Code Block
languagetext
/opt/mapr/hue/hue-3.5.0/bin/logs/runcpserver.log

If the Hue webserver was installed on an edge node, the log is found here:

Code Block
languagetext
/opt/mapr/hue/hue-3.5.0/logs/runcpserver.log

 

...

 The Hive Metastore service is started automatically by the Warden at installation time if the mapr-hivemetastore package is installed. If you change the hive-site.xml configuration file, you must restart the service. Follow these steps:

  1. Make a list of nodes on which Hive Metastore is configured.
  2. Issue the maprcli node services stop command, specifying the nodes on which Hive Metastore is configured, separated by spaces. Example:

    Code Block
    maprcli node services -name hivemeta -action stop -nodes node001 node002 node003
  3. Issue the maprcli node services start command for the same nodes. Example:

    Code Block
    maprcli node services -name hivemeta -action start -nodes node001 node002 node003

Verifying that the Hive Metastore Server Started

To verify that the Hive Metastore server started, enter:

 

Code Block
languagetext
lsof -i:9083

 

The output should look similar to this:
Code Block
languagetext
COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
java    24440 root  134u  IPv4 70167948      0t0  TCP *:9083 (LISTEN)
You can also check the hive logs found here:

 

Code Block
/opt/mapr/hive/hive-*/logs/
/tmp/<USER>/hive.log

 

where <USER> is the user who started the Hive metastore service.

 

Next Steps

You may also want to complete the following additional configurations: 

Children Display

See Use Hue for an overview of Hue functionality. See Logging into Hue 3.x for login instructions.

 You must also restart Hue for these changes to take effect.