What's New in Version 4.1
The 4.1 release of the MapR Distribution for Apache Hadoop contains the following new features. If you are upgrading to 4.1 directly from Version 3.1.x, you may want to look at the Release Notes for Versions 4.0.1 and 4.0.2 to read about the features that were added in those releases.
New Features in MapR-DB
Version 4.1 introduces the following new features in MapR-DB.
You can replicate changes from one MapR-DB table to another table that is in a separate cluster or within the same cluster. You can replicate entire tables, specific column families, and specific columns. Version 4.1 supports asynchronous and synchronous modes of replication, and two basic types of replication topologies: master-slave and multi-master. See Replicating MapR-DB Tables.
C API Support
MapR-DB now includes a version of libhbase, a library of asynchronous C APIs for creating and accessing Apache HBase tables. This version runs more efficiently and performs faster against MapR-DB tables. For more information, see Creating MapR-DB Applications.
New maprcli Commands
The following maprcli commands are available in 4.1:
maprcli table replica commands: multiple commands to manage replicas.
maprcli table upstream commands: multiple commands to manage upstream sources for replicas.
maprcli cluster gateway commands: multiple commands to manage gateways.
maprcli fid commands: commands to view detailed information and statistics for FIDs.
maprcli volume container switchmaster: switches the master container to a replica that is associated with the same container.
The following utilities are available in 4.1:
DiffTables: compares two Mapr-DB tables.
DiffTablesWithCrc: for greater efficiency, uses a CRC algorithm to detect differences between two MapR-DB tables and then compare the sets of rows where a difference was detected.
FormatResult: parses sequence files generated by the DiffTables utility or the DiffTablesWithCrc utility and converts the file content to a format that makes it easier to understand.
Improvements to the gfsck Utility
The performance of the gfsck utility with the
-b option is optimized for checks against MapR-DB tables.
Installation and Upgrade Notes
Before installing or upgrading to Version 4.1, read the following sections.
Node Requirements for Table Replication
If you intend to use the table replication feature, you need to designate nodes on the destination cluster as gateway nodes. You have to install the
mapr-gateway package on these nodes. You also need to install the HBase 0.98.9 client on your web server nodes (
mapr-hbase); otherwise, table replication features will not work in the MCS.
Enabling New Features
If you are upgrading to 4.1 from Version 4.0.2, you need to enable the table replication feature (
mfs.feature.db.repl.support). If you are upgrading from Version 4.0.1 or earlier, you may need to enable additional features that were added in previous releases.
MapR Client Compatibility
In general, Version 4.0.1 and 4.0.2 MapR clients will continue to work against a cluster that is upgraded to Version 4.1. However, the RM HA configuration on the client must match the configuration on the cluster. For example, zero-configuration RM HA was not supported in Version 4.0.1 so a Version 4.0.1 YARN client will not work with a Version 4.1 RM HA cluster.
MapR Interoperability Matrix
See the Interoperability Matrix pages for detailed information about MapR server, JDK, client, and ecosystem compatibility.
The Hadoop ecosystem components are hosted in a repository that is specific to Version 4.x: http://package.mapr.com/releases/ecosystem-4.x
Note the following functional changes in Version 4.1.
MapR continues to support GSON in Version 4.1, but the following JAR file for Hadoop 2 has been removed from the distribution:
The JAR file for Hadoop 1 is still included:
Impact: MapR software has no dependency on this GSON library. However, MrV2 applications that link to the MapR GSON library may no longer work.
Workaround: Install the appropriate version of the GSON library for use with your applications. For example, you can download and install the
maprcli Commands and Utilities
Update to the CopyTable Utility
In 4.1, you can use the MapR CopyTable utility to copy a MapR-DB table with either a MapReduce job or a client process. Previously, the CopyTableTest utility copied a MapR-DB table with a client process and the CopyTable utility copied a MapR-DB table with a MapReduce job.
The following maprcli changes are available in 4.1:
maprcli table listrecent is not longer available through the maprcli. Instead, you can view the table that you recently accessed on the Tables view in the MCS.
The following maprcli table commands no longer have an output parameter:
table region list
table cf list
table cf colperm get
You may encounter the following known issues after upgrading to Version 4.1.
Metrics Database Not Yet Supported for YARN Applications
You cannot use the Metrics Database to record activity for applications that run in YARN (MRv2). The database only supports MRv1 jobs.
Resource Manager Issues
14696/15100: When automatic or manual ResourceManager failover is enabled and a job is submitted with impersonation turned ON by a user without impersonation privileges, the job submission eventually times out instead of returning an appropriate error. This behavior does not affect standard ecosystem services such as HiveServer because they are configured to run as the mapr user (with impersonation allowed). However, this problem does affect non-ecosystem applications or services that attempt to submit jobs with impersonation turned ON. MapR recommends that customers add the user in question to the impersonation list so that the job can proceed.
14907: When several jobs are submitted and the ResourceManager is using the ZKRMStateStore for failover, the cluster may experience ZooKeeper timeouts and instability. MapR recommends that customers always use the FileSystemRMStateStore to support ResourceManager HA. See Configuring the ResourceManager State Store.
15864: When the ApplicationMaster does not run on the same node as the ResourceManager, the ApplicationMaster UI does not work. In this case, you cannot view the application status or application logs until the application completes.
Workaround: Add the ResourceManager Web UI address property to the client's yarn-site.xml using the following syntax and supplying the correct hostname for the ResourceManager:
Note: If the ResourceManager process fails over to another node, you need to update the value of this property to reflect the web application address of the new ResourceManager process.
Installation and Configuration Issues
CentOS Version 6.3 and Earlier: MapR installations on Version 6.3 and earlier may fail because of an unresolved dependency on the
Add this repository: http://mirror.centos.org/centos/6/os/x86_64/
Manually download and install the RPM:
yum localinstall redhat-lsb-core-4.0-7.el6.centos.x86_64.rpm
15201: The Quick Installer installation logs print "Configuring Hive" and "Configuring Spark" messages even when these components were not configured.
16216: When you run
configure.sh with the
-HS option on client nodes, the
mapred-site.xml is re-generated and does not retain existing user settings.
16155: In order to reconfigure a Mac client from secure mode to non-secure mode (or vice versa), you must follow these steps:
Manually remove the entry for the current cluster from:
16386: If you enable centralized logging on a cluster that was using YARN log aggregation in Version 4.0.1 prior to upgrading to version 4.1, you can no longer access previously aggregated MapReduce logs from the HistoryServer UI.
Workaround: Perform the following steps to view previously aggregated MapReduce logs from the History Server UI:
Use the yarn logs command to retrieve the logs for each MapReduce application. The output of this command contains stdout, stderr, syslog with specific delimiters.
Parse the output of yarn logs command to create syslog, stdout, stderr files using UNIX tools such as sed or awk.
Add the syslog, stdout, stderr files to the centrallized logging directory with the following directory hierarchy:
You will need to create the application and container directories and provide the user that submitted the application the proper permissions on the files and directories. For example, if
usera submitted the application,
usera should have the following permissions on the directories and log files:
drwxr-s--- 5 usera mapr 4096 2015-01-07 11:32 /var/mapr/local/qa-node101.qa.lab/logs/yarn/userlogs/application_<id>
drwxr-s--- - usera mapr 3 2015-01-07 11:32 /var/mapr/local/qa-node101.qa.lab/logs/yarn/userlogs/application_<id>/container_<id>
-rw-r----- 2 usera mapr 290 2015-01-07 11:32 /var/mapr/local/qa-node101.qa.lab/logs/yarn/userlogs/application_<id>/container_<id>/stderr
Note: After you complete the workaround, you will also be able to run maprcli job linklogs on these logs.
The following issues are resolved in Version 4.1.
Installation and Configuration
createsystemvolume script is enhanced in Version 4.1 to improve its scalability on large clusters.
15359: The Metrics database no longer raises frequent alarms when data is inserted.
hb_get_add_column C API in libMapRClient now accepts NULL for the optional column qualifier in its signature.
16213: When the ResourceManager (RM) and Webserver are installed on the same node, the MCS now shows the correct IP address in the URL for the RM.
16792: The MapR distribution now includes and supports Apache Avro 1.7.6. Previous releases supported Version 1.7.4.
16710: Jobs no longer fail when you specify a custom split size using the -Dmapred.max.split.size parameter.
16956: Jobs run in standalone mode no longer fail when using a Windows client.
16961: TaskTracker prevents the localization of a job from blocking the processing of Kill operations on the job and tasks.
16500: Hadoop commands run via the MapR Client on Windows no longer cause JVM crashes.
17003: For MapR-DB inserts, the MapR Client on Windows now reads timestamps as 64-bit integers to avoid overflows.
17048: The MapR Client on Windows now deletes rows successfully when only the row key is specified and no column family or column information is included.
15094: Internal errors that made it possible for containers to become stuck in BECOME_MASTER state have been resolved.
15184: Client Java applications would experience JVM crashes after accessing a table (MapR-DB table or HBase table) or instantiating an HTable object for the same table more than 32767 times without closing the table.
15631: An internal error that was causing the gfsck utility to delete a tablet has been resolved.
15751: An internal issue that could cause MapR-FS to crash during a disk I/O error has been resolved.
16481: Block allocation has been improved to prevent performance impact due to CPU usage by MapR-FS.
16906: The FileClient no longer prints incorrect trace logs.
17248: Container replication no longer fails with the error: Container resync failed to send orphanlist
17519: A MapR-FS compression error no longer stalls the processing of jobs.
16939: The NFS server’s file attribute cache is now updated for each write.
Build and Package
16887: The GSON jar file for Hadoop 2 is no longer shipped with the MapR distribution. See GSON Support.
17294: A missing dependency of
org.json was added to the