With manual or automatic failover, an active ResourceManager and two standby ResourceManager processes run in the cluster. The standby ResourceManager nodes run the ResourceManager process without loading the working state. When the active ResourceManager fails, one of the standby ResourceManager nodes can load the working state from the ZooKeeper and continue providing services to the cluster.
ResourceManager clients (MapR client nodes, ApplicationMaster processes, and NodeManager nodes) attempt connections to the ResourceManager nodes in a round-robin fashion until they hit an active ResourceManager node. If the active ResourceManager node is down, ResourceManager clients resume round-robin polling until an active ResourceManager node is detected.
For web requests, including REST API requests, standby ResourceManager nodes automatically redirect web requests to the active ResourceManager node.
The difference between manual and automatic failover is how the transition from standby to active occurs for the ResourceManager process.
- With manual failover, you manually invoke the transition of the ResourceManager from standby to active with the yarn rmadmin command.
- With automatic failover, the ResourceManager processes have an embedded ZooKeeper-based ActiveStandbyElector, which chooses the active ResourceManager. This ActiveStandbyElector also detects failures in the currently active ResourceManager and automatically transitions one of the standby ResourceManagers to an active state.
If you specify multiple ResourceManagers when you run configure.sh, automatic failover is configured. However, you can edit the yarn-site.xml to enable manual failover instead.
This section contains the following topics that describe how to manage manual and automatic failover:
- Automatic Failover Administration
- Configuring Automatic Failover for the ResourceManager
- Manual Failover Administration
- Checking the ResourceManager State
- Using Central Configuration with Manual and Automatic Failover
Automatic Failover Administration
The Zookeeper-based ActiveStandbyElector on each ResourceManager node detects failures in the currently active ResourceManager and automatically transitions one of the standby ResourceManagers to an active state. Therefore,
rmadmin -transitionToStandby and -
transitionToActive are disabled.
For information about using central configuration with automatic failover, see Using Central Configuration with Manual or Automatic Failover.
Configuring Automatic Failover for the ResourceManager
To use automatic failover, specify multiple ResourceManagers when you run configure.sh on each node in the cluster.
The following the configure.sh script syntax configures three ResourceManager nodes (one active and two standby) and one HistoryServer node:
The following configure.sh syntax specifies three ResourceManager nodes (nodeA, nodeB, and nodeC) and a HistoryServer node (nodeA):
Manual Failover Administration
Configure manual failover for the ResourceManager if you want to manually transition the state of ResourceManagers in the cluster. In the event of a ResourceManager failure, you use
rmadmin commands to check the status of each ResourceManager and then transition a standby ResourceManager to the active state.
For information about using central configuration with manual failover, see Using Central Configuration with Manual or Automatic Failover.
Configuring Manual Failover for the ResourceManager
To configure manual failover, specify multiple ResourceManagers when you run configure.sh on each node in the cluster and then edit yarn-site.xml to disable automatic failover.
Specify multiple ResourceManagers when you run configure.sh on each cluster and client node.
The following the configure.sh script syntax configures three ResourceManager nodes (one active and two standby):
- Disable the following automatic failover properties in the yarn-site.xml on each node with the ResourceManager role:
- Restart the ResourceManager service.
For more information, see Starting, Stopping, and Restarting Services.
Transitioning a Standby ResourceManager to Active
yarn rmadmin command includes options to manage high availability for the ResourceManager, including transitioning a ResourceManager node between active and standby modes. These commands take the ResourceManager service ID as an argument and can be run on any node in the cluster. The serviceID of a ResourceManager is set in the
yarn.resourcemanager.ha.rm-ids property of the yarn-site.xml file.
Transition a standby ResourceManager to the active state when the active ResourceManager process has failed or the node that runs the process is no longer accessible.
Determine if an active ResourceManager is running in the cluster. See Checking the ResourceManager State.
Run the following command to set the current active ResourceManager to standby:
Run the following command to transition the standby ResourceManager to the active state:
Checking the ResourceManager State
When you configure manual or automatic failover, the ResourceManager is either in active or standby state. Each ResourceManager has a serviceID that identifies the service.
To check the state of a ResourceManager, run the following command with the serviceID:
The command returns
standby based on the state of the ResourceManager associated with the serviceID that you provide.
Using Central Configuration with Manual and Automatic Failover
When you configure manual or automatic failover for the ResourceManager, the contents of the
yarn-site.xml configuration file are slightly different on each ResourceManager node because the value of the yarn.resourcemanager.ha.id property is distinct for each ResourceManager node. If your cluster is using the central configuration feature, configure central configuration overrides for each ResourceManager node:
- Keep a central copy of the file at /var/mapr/configuration/default/hadoop/hadoop-2.x/etc/hadoop/yarn-site.xml.
- Configure central configuration overrides in the following manner:
/var/mapr/configuration/nodes/<HOSTNAME FOR RM 1>/hadoop/hadoop-2.x/etc/hadoop/yarn-site.xml
/var/mapr/configuration/nodes/<HOSTNAME FOR RM 2>/hadoop/hadoop-2.x/etc/hadoop/yarn-site.xml
/var/mapr/configuration/nodes/<HOSTNAME FOR RM 3>/hadoop/hadoop-2.x/etc/hadoop/yarn-site.xml
For more information, see Central Configuration.