This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

Use any of the following methods to copy data from an HDFS cluster to a MapR cluster:

MethodDescription
hdfs://
protocol
 
You can use the hadoop distcp command with the hdfs:// protocol to copy data from an HDFS cluster into a MapR cluster. Use this method if the HDFS cluster and the MapR cluster use the same RPC protocol version. For all other scenarios, use the webhdfs:// protocol or NFS gateway to copy data to a MapR cluster.
webhdfs:// protocol You can use the hadoop distcp command with the webhdfs:// protocol to copy data from an HDFS cluster into a MapR cluster.
NFSYou can mount a MapR cluster to an HDFS cluster via NFS mount and then use the hadoop distcp command to copy data between the two clusters.

Copy Data Using the hdfs:// Protocol

Before you can copy data from an HDFS cluster to a MapR cluster using the hdfs:// protocol, you must configure the MapR cluster to access the HDFS cluster. To do this, complete the steps listed in Configuring a MapR Cluster to Access an HDFS Cluster for the security scenario that best describes your HDFS and MapR clusters and then complete the steps listed under Verifying Access to an HDFS Cluster.

You also need the following information:

  • <NameNode> - the IP address or hostname of the NameNode in the HDFS cluster
  • <NameNode Port> - the port for connecting to the NameNode in the HDFS cluster
  • <HDFS path> - the path to the HDFS directory from which you plan to copy data
  • <MapR-FS path> - the path in the MapR cluster to which you plan to copy HDFS data
  • <file> - a file in the HDFS path

Copying Data 

To copy data from HDFS to MapR-FS using the hdfs:// protocol, complete the following steps:

  1. Run the following hadoop command to determine if the MapR cluster can read the contents of a file in a specified directory on the HDFS cluster:

    Example
  2. If the MapR cluster can read the contents of the file, run the distcp command to copy the data from the HDFS cluster to the MapR cluster:

    Example

    Note the required triple slashes in 'maprfs:///...'. 

Copy Data Using the webhdfs:// Protocol

Before you can copy data from an HDFS cluster to a MapR cluster using the webhdfs:// protocol, you must configure the MapR cluster to access the HDFS cluster. To do this, complete the steps listed in Configuring a MapR Cluster to Access an HDFS Cluster for the security scenario that best describes your HDFS and MapR clusters and then complete the steps listed under Verifying Access to an HDFS Cluster.

The HDFS cluster must have WebHDFS enabled. Verify that the following parameter exists in the hdfs-site.xml file and that the value is set to "true."

You also need the following information:

  • <NameNode> - the IP address or hostname of the NameNode in the HDFS cluster
  • <NameNode HTTP Port> - the HTTP port on the NameNode in the HDFS cluster
  • <HDFS path> - the path to the HDFS directory from which you plan to copy data
  • <MapR-FS path> - the path in the MapR cluster to which you plan to copy HDFS data

Copying Data

Run the following command from a node in the MapR cluster to copy data from HDFS to MapR-FS using webhdfs://:

Example

Note the required triple slashes in 'maprfs:///...'.

Copy Data Using NFS

If NFS is installed on the MapR cluster, you can mount the MapR cluster to the HDFS cluster and then copy files from one cluster to the other using hadoop distcp. If you do not have NFS installed and a mount point configured, see Accessing Data with NFS and Setting Up MapR NFS.

To perform a copy using distcp via NFS, you need the following information:

  • <MapR NFS Server> - the IP address or hostname of the NFS server in the MapR cluster

  • <maprfs_nfs_mount> - the NFS export mount point configured on the MapR cluster; default is /mapr

  • <hdfs_nfs_mount> - the NFS mount point configured on the HDFS cluster

  • <NameNode> - the IP address or hostname of the NameNode in the HDFS cluster
  • <NameNode Port> - the port on the NameNode in the HDFS cluster
  • <HDFS path> - the path to the HDFS directory from which you plan to copy data 
  • <MapR-FS path> - the path in the MapR cluster to which you plan to copy HDFS data

Mounting HDFS

Issue the following command to mount the MapR cluster to the HDFS NFS mount point:

Example

Copying Data

  1. Issue the following command to copy data from the HDFS cluster to the MapR cluster:

    Example
  2. Issue the following command from the MapR cluster to verify that the file was copied to the MapR cluster:

    Example
  • No labels