This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

The disk space balancer and the replication role balancer redistribute data in the MapR storage layer to ensure maximum performance and efficient use of space:

  • The disk space balancer works to ensure that the percentage of space utilized on all storage pools in the cluster is similar, so that no nodes are overloaded.
  • The replication role balancer changes the replication roles of cluster containers so that the replication process uses network bandwidth evenly.

To view balancer configuration values:

  • Pipe the maprcli config load command through grep. Example:

To set balancer configuration values:

  • Use the config save command to set the appropriate values. Example:
Icon

By default, the balancers are turned off.

  • To turn on the disk space balancer, use config save to set cldb.balancer.disk.paused to 0
  • To turn on the replication role balancer, use config save to set cldb.balancer.role.paused to 0

Disk Space Balancer

The disk space balancer is a tool that balances disk space usage on a cluster by moving containers between nodes.

The disk space balancer distributes containers to storage pools in other nodes that have lower utilization than the average for that cluster. The disk space balancer checks every storage pool on a regular basis and moves containers from a storage pool when that pool's utilization meets the following conditions:

  • The storage pool is over 70% full.
  • The storage pool's utilization exceeds the average utilization on the cluster by a specified threshold:
    • When the average cluster storage utilization is below 80%, the threshold is 10%.
    • When the average cluster storage utilization is below 90% but over 80%, the threshold is 3%.
    • When the average cluster storage utilization is below 94% but over 90%, the threshold is 2%.

The disk space balancer aims to ensure that the percentage of space used on all the disks in the cluster is similar.

You can view disk usage on all nodes in the Disks view, by clicking Cluster > Nodes in the Navigation pane and the choosing Disks from the dropdown.

Disk Space Balancer Configuration Parameters

Parameter

Value

Description

cldb.balancer.disk.threshold.percentage

70

Threshold for moving containers out of a given storage pool, expressed as utilization percentage.

cldb.balancer.disk.paused

1

Specifies whether the disk space balancer runs:

  • 0 - Not paused (normal operation)
  • 1 - Paused (does not perform any container moves)

cldb.balancer.disk.max.switches.in.nodes.percentage

10

This can be used to throttle the disk balancer. If it is set to 10, the balancer will throttle the number of concurrent container moves (minimum 1) to 10% of the total nodes in the cluster rounded up.

Disk Space Balancer Status

Use the maprcli dump balancerinfo command to view detailed information about the storage pools on a cluster.

If there are any active container moves at the time the command is run, maprcli dump balancerinfo returns information about the source and destination storage pools.

For more information about this command, see maprcli dump balancerinfo.

Disk Space Balancer Metrics

The maprcli dump balancermetrics command returns a cumulative count of container moves and MB of data moved between storage pools since the current CLDB became the the master CLDB.

For more information about this command, see maprcli dump balancermetrics.

Replication Role Balancer

The replication role balancer is a tool that switches the replication roles of containers to ensure that every node has an equal share of master and replica containers (for name containers) and an equal share of master, intermediate, and tail containers (for data containers).

The replication role balancer changes the replication role of the containers in a cluster so that network bandwidth is spread evenly across all nodes during the replication process. A container's replication role determines how it is replicated to the other nodes in the cluster. For name containers (the volume's first container), replication occurs simultaneously from the master to all replica containers. For data containers, replication proceeds from the master to the intermediate container(s) until it reaches the tail containers. Replication occurs over the network between nodes, often in separate racks.

Replication Role Balancer Configuration Parameters

Parameter

Value

Description

cldb.balancer.role.paused

1

Specifies whether the role balancer runs:

  • 0 - Not paused (normal operation)
  • 1 - Paused (does not perform any container replication role switches)

cldb.balancer.role.max.switches.in.nodes.percentage

10

This can be used to throttle the role balancer. If it is set to 10, the balancer will throttle the number of concurrent role
switches to 10% of the total nodes in the cluster (minimum 2).

Replication Role Balancer Status

The maprcli dump rolebalancerinfo command returns information the number of active replication role switches. During a replication role switch, the replication role balancer selects a master or intermediate data container and switches its replication role to that of a tail data container.

For more information about this command, see maprcli dump rolebalancerinfo.

  • No labels