Skip to end of metadata
Go to start of metadata

Unlike other Hadoop distributions that only allow cluster data import or import as a batch operation, MapR lets you mount the cluster itself via NFS so that your applications can read and write data directly. MapR allows direct file modification and multiple concurrent reads and writes via POSIX semantics. With an NFS-mounted cluster, you can read and write data directly with standard tools, applications, and scripts. For example, you could run a MapReduce job that outputs to a CSV file, then import the CSV file directly into SQL via NFS.

 View this video for an explanation of NFS mounting models and data flows...

MapR exports each cluster as the directory /mapr/<cluster name> (for example, /mapr/my.cluster.com). If you create a mount point with the local path /mapr, then Hadoop FS paths and NFS paths to the cluster will be the same. This makes it easy to work on the same files via NFS and Hadoop. In a multi-cluster setting, the clusters share a single namespace, and you can see them all by mounting the top-level /mapr directory.

Icon

MapR uses version 3 of the NFS protocol. NFS version 4 bypasses the port mapper and attempts to connect to the default port only. If you are running NFS on a non-standard port, mounts from NFS version 4 clients time out. Use the -o nfsvers=3 option to specify NFS version 3.

This page contains the following sections:

See Setting up MapR NFS to set up NFS on a non-standard port.

Mounting the Cluster

Before you begin, make sure you know the hostname and directory of the NFS share you plan to mount.
Example:

  • usa-node01:/mapr - for mounting from the command line
  • nfs://usa-node01/mapr - for mounting from the Mac Finder

Mounting NFS to MapR-FS on a Cluster Node

To automatically mount NFS to MapR-FS on the cluster my.cluster.com at the /mapr mount point, add the following line to /opt/mapr/conf/mapr_fstab:

Icon

The change to /opt/mapr/conf/mapr_fstab will not take effect until warden is restarted.

Every time your system is rebooted, the mount point is automatically reestablished according to the mapr_fstab configuration file.

To manually mount NFS to MapR-FS at the /mapr mount point:

  1. Set up a mount point for an NFS share. Example:
    sudo mkdir /mapr
  2. Mount the cluster via NFS. Example:
    sudo mount -o nolock usa-node01:/mapr /mapr
Icon

When you mount manually from the command line, the mount point does not persist after a reboot.

Mounting NFS on a Linux Client

To mount automatically when your system starts up, add an NFS mount to /etc/fstab. Example:

To mount NFS on a Linux client manually:

  1. Make sure the NFS client is installed. Examples: 
    • sudo yum install nfs-utils (Red Hat or CentOS)
    • sudo apt-get install nfs-common (Ubuntu)
    • sudo zypper install nfs-client (SUSE)
  2. List the NFS shares exported on the server. Example:
    showmount -e usa-node01
  3. Set up a mount point for an NFS share. Example:
    sudo mkdir /mapr
  4. Mount the cluster via NFS. Example:
    sudo mount -o nolock usa-node01:/mapr /mapr
Icon

The mount point does not persist after reboot when you mount manually from the command line.

Mounting NFS on a Mac Client

To mount the cluster manually from the command line:

  1. Open a terminal (one way is to click on Launchpad > Open terminal).
  2. At the command line, enter the following command to become the root user:
    sudo bash
  3. List the NFS shares exported on the server. Example:
    showmount -e usa-node01
  4. Set up a mount point for an NFS share. Example:
    sudo mkdir /mapr
  5. Mount the cluster via NFS. Example:
    sudo mount -o nolock usa-node01:/mapr /mapr
  6. List all mounted filesystems to verify that the cluster is mounted.
    mount

Mounting NFS on a Windows Client

Setting up the Windows NFS client requires you to mount the cluster and configure the user ID (UID) and group ID (GID) correctly, as described in the sections below. In all cases, the Windows client must access NFS using a valid UID and GID from the Linux domain. Mismatched UID or GID will result in permissions problems when MapReduce jobs try to access files that were copied from Windows over an NFS share.

Icon

Because of Windows directory caching, there may appear to be no .snapshot directory in each volume's root directory. To work around the problem, force Windows to re-load the volume's root directory by updating its modification time (for example, by creating an empty file or directory in the volume's root directory).

Icon

With Windows NFS clients, use the -o nolock option on the NFS server to prevent the Linux NLM from registering with the portmapper.
The native Linux NLM conflicts with the MapR NFS server.

Mounting the cluster

To mount the cluster on Windows 7 Ultimate or Windows 7 Enterprise

  1. Open Start > Control Panel > Programs.
  2. Select Turn Windows features on or off.
  3. Select Services for NFS.
  4. Click OK.
  5. Mount the cluster and map it to a drive using the Map Network Drive tool or from the command line. Example:
    mount -o nolock usa-node01:/mapr z:

To mount the cluster on other Windows versions

  1. Download and install Microsoft Windows Services for Unix (SFU). You only need to install the NFS Client and the User Name Mapping.
  2. Configure the user authentication in SFU to match the authentication used by the cluster (LDAP or operating system users). You can map local Windows users to cluster Linux users, if desired.
  3. Once SFU is installed and configured, mount the cluster and map it to a drive using the Map Network Drive tool or from the command line. Example:
    mount -o nolock usa-node01:/mapr z:

Mapping a network drive

To map a network drive with the Map Network Drive tool

 

  1. Open Start > My Computer.
  2. Select Tools > Map Network Drive.
  3. In the Map Network Drive window, choose an unused drive letter from the Drive drop-down list.
  4. Specify the Folder by browsing for the MapR cluster, or by typing the hostname and directory into the text field.
  5. Browse for the MapR cluster or type the name of the folder to map. This name must follow UNC. Alternatively, click the Browse… button to find the correct folder by browsing available network shares.
  6. Select Reconnect at login to reconnect automatically to the MapR cluster whenever you log into the computer.
  7. Click Finish.

Configuring UID and GID for NFS access

To access NFS share when system is part of Active Directory Domain

You must instruct the NFS client to access an AD server to get uidNumber and gidNumber. At a high level, the process is as follows:

  1. Ensure the AD Users schema has auxiliary class posixAccount.
  2. Populate AD uidNumber and gidNumber fields with matching uid and gid from Linux.
  3. Configure the NFS client to look up uid and gid in the AD DS store.

Refer to details here: http://technet.microsoft.com/en-us/library/hh509016(v=ws.10).aspx.

To access NFS share from a standalone system

For a standalone Windows 7 or Vista machine (not using Active Directory), Windows always uses its configured Anonymous UID and GID for NFS access, which by default are -2. However, you can configure Windows to use specific values, which results in being able to access NFS using those values.

The UID and GID values are set in the Windows Registry and are global on the Windows NFS client box. This solution might not work well if your Windows box has multiple users who each need access to NFS with their own permissions, but there is no obvious way to avoid this limitation.

The values are stored in the registry path HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ClientForNFS\CurrentVersion\Default. The two DWORD values are AnonymousUid and AnonymousGid. If they do not exist, you must create them.

Refer to details here: http://blogs.msdn.com/b/sfu/archive/2009/03/27/can-i-set-up-user-name-mapping-in-windows-vista.aspx.

The Windows Lock Manager and Network Error 53

When the nlockmgr service is active on a Windows machine, attempts to mount a MapR NFS share fail with the following error message:

C:\Users\administrator.Client1>mount -o nolock -u:mapr -p:mapr ClusterNode1:/mapr / g:
Network Error - 53
Type 'NET HELPMSG 53' for more information.

To resolve this condition, first run the rpcinfo command to confirm that the nlockmgr service is active:

C:\Users\administrator.Client1>rpcinfo -p ClusterNode1

   program version protocol     port
--------------------------------------------------
    100000       4      tcp      111    portmapper
    100024       1      udp    60588    status
    100007       2      udp      817    ypbind
    100021       1      udp    47016    nlockmgr
    100021       3      udp    47016    nlockmgr
    100021       4      udp    47016    nlockmgr
    100021       1      tcp    34254    nlockmgr
    100021       3      tcp    34254    nlockmgr
    100021       4      tcp    34254    nlockmgr

Check the output for the presence of nlockmgr. To deregister nlockmgr services on the node, use the -d switch in rpcinfo on the MapR node:

[root@ClusterNode1 ~]# rpcinfo -d 100021 1
[root@ClusterNode1 ~]# rpcinfo -d 100021 2
[root@ClusterNode1 ~]# rpcinfo -d 100021 3
[root@ClusterNode1 ~]# rpcinfo -d 100021 4

Re-check rpcinfo output to verify that no nlockmgr services are registered. The NFS mount completes successfully at this point:

C:\Users\administrator.Client1>mount -o nolock -u:mapr -p:mapr ClusterNode1:/mapr/ Z:
Z: is now successfully connected to ClusterNode1:/mapr/
The command completed
successfully.

Using Named Pipes

As of version 2.1.2, MapR supports using named pipes over NFS. This feature is enabled by default in new installations of versions 2.1.2 and above; for upgrades, enable the feature manually by setting mfs.feature.devicefile.support  to 1 using the maprcli config save command. Example:

maprcli config save -values '{"mfs.feature.devicefile.support":"1"}'

To disable the feature, set mfs.feature.devicefile.support to 0.

Setting Compression and Chunk Size

Each directory in MapR storage contains a hidden file called .dfs_attributes that controls compression and chunk size. To change these attributes, change the corresponding values in the file.

Example:

Valid values:

  • Compression: lz4, lzf, zlib, or false
  • Chunk size (in bytes): a multiple of 65535 (64 K) or zero (no chunks). Example: 131072

You can also set compression and chunksize using the hadoop mfs command.

By default, MapR does not compress files whose filename extension indicate they are already compressed. The default list of filename extensions is as follows:

  • bz2
  • gz
  • lzo
  • snappy
  • tgz
  • tbz2
  • zip
  • z
  • Z
  • mp3
  • jpg
  • jpeg
  • mpg
  • mpeg
  • avi
  • gif
  • png

The list of filename extensions not to compress is stored as comma-separated values in the mapr.fs.nocompression configuration parameter, and can be modified with the config save command. Example:

The list can be viewed with the config load command. Example:

  • No labels