This is documentation for MapR version 4.0.x. You can also refer to MapR documentation for the latest or previous releases.

Skip to end of metadata
Go to start of metadata

You can use MapR to protect your data from hardware failures, accidental overwrites, and natural disasters. MapR organizes data into volumes so that you can apply different data protection strategies to different types of data. The following scenarios describe a few common problems and how easily and effectively MapR protects your data from loss.

This page contains the following topics:

Scenario: Hardware Failure

Even with the most reliable hardware, growing cluster and datacenter sizes will make frequent hardware failures a real threat to business continuity. In a cluster with 10,000 disks on 1,000 nodes, it is reasonable to expect a disk failure more than once a day and a node failure every few days.

Solution: Topology and Replication Factor

MapR automatically replicates data and places the copies on different nodes to safeguard against data loss in the event of hardware failure. By default, MapR assumes that all nodes are in a single rack. You can provide MapR with information about the rack locations of all nodes by setting topology paths. MapR interprets each topology path as a separate rack, and attempts to replicate data onto different racks to provide continuity in case of a power failure affecting an entire rack. These replicas are maintained, copied, and made available seamlessly without user intervention.

To set up topology and replication:

  1. In the MapR Control System, open the MapR-FS group and click Nodes to display the Nodes view.
  2. Set up each rack with its own path. For each rack, perform the following steps:
    1. Click the checkboxes next to the nodes in the rack.
    2. Click the Change Topology button to display the Change Node Topology dialog.
    3. In the Change Node Topology dialog, type a path to represent the rack. For example, if the cluster name is cluster1 and the nodes are in rack 14, type /cluster1/rack14.
  3. When creating volumes, choose a Replication Factor of 3 or more to provide sufficient data redundancy.

Scenario: Accidental Overwrite

Even in a cluster with data replication, important data can be overwritten or deleted accidentally. If a data set is accidentally removed, the removal itself propagates across the replicas and the data is lost. Users or applications can corrupt data, and once the corruption spreads to the replicas the damage is permanent.

Solution: Snapshots

With MapR, you can create a point-in-time snapshot of a volume, allowing recovery from a known good data set. You can create a manual snapshot to enable recovery to a specific point in time, or schedule snapshots to occur regularly to maintain a recent recovery point. If data is lost, you can restore the data using the most recent snapshot (or any snapshot you choose). Snapshots do not add a performance penalty, because they do not involve additional data copying operations; a snapshot can be created almost instantly regardless of data size.

Example: Creating a Snapshot Manually

  1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
  2. Select the checkbox beside the name the volume, then click the New Snapshot button to display the Snapshot Name dialog.
  3. Type a name for the new snapshot in the Name... field.
  4. Click OK to create the snapshot.

Example: Scheduling Snapshots

This example schedules snapshots for a volume hourly and retains them for 24 hours.

To create a schedule:

  1. In the Navigation pane, expand the MapR-FS group and click the Schedules view.
  2. Click New Schedule.
  3. In the Schedule Name field, type "Every Hour".
  4. From the first dropdown menu in the Schedule Rules section, select Hourly.
  5. In the Retain For field, specify 24 Hours.
  6. Click Save Schedule to create the schedule.

To apply the schedule to the volume:

  1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
  2. Display the Volume Properties dialog by clicking the volume name, or by selecting the checkbox beside the volume name then clicking the Properties button.
  3. In the Replication and Snapshot Scheduling section, choose "Every Hour."
  4. Click Modify Volume to apply the changes and close the dialog.

Scenario: Disaster Recovery

A severe natural disaster can cripple an entire datacenter, leading to permanent data loss unless a disaster plan is in place.

Solution: Mirroring to Another Cluster

MapR makes it easy to protect against loss of an entire datacenter by mirroring entire volumes to a different datacenter. A mirror is a full read-only copy of a volume that can be synced on a schedule to provide point-in-time recovery for critical data. If the volumes on the original cluster contain a large amount of data, you can store them on physical media using the volume dump create command and transport them to the mirror cluster. Otherwise, you can simply create mirror volumes that point to the volumes on the original cluster and copy the data over the network. The mirroring operation conserves bandwidth by transmitting only the deltas between the source and the mirror, and by compressing the data over the wire. In addition, MapR uses checksums and a latency-tolerant protocol to ensure success even on high-latency WANs. You can set up a cascade of mirrors to replicate data over a distance. For instance, you can mirror data from New York to London, then use lower-cost links to replicate the data from London to Paris and Rome.

To set up mirroring to another cluster:

  1. Use the volume dump create command to create a full volume dump for each volume you want to mirror.
  2. Transport the volume dump to the mirror cluster.
  3. For each volume on the original cluster, set up a corresponding volume on the mirror cluster.
    1. Restore the volume using the volume dump restore command.
    2. In the MapR Control System, click Volumes under the MapR-FS group to display the Volumes view.
    3. Click the name of the volume to display the Volume Properties dialog.
    4. Set the Volume Type to Remote Mirror Volume.
    5. Set the Source Volume Name to the source volume name.
    6. Set the Source Cluster Name to the cluster where the source volume resides.
    7. In the Replication and Mirror Scheduling section, choose a schedule to determine how often the mirror will sync.

To recover volumes from mirrors:

  1. Use the volume dump create command to create a full volume dump for each mirror volume you want to restore. Example:
    maprcli volume create -e statefile1 -dumpfile fulldump1 -name volume@cluster
  2. Transport the volume dump to the rebuilt cluster.
  3. For each volume on the mirror cluster, set up a corresponding volume on the rebuilt cluster.
    1. Restore the volume using the volume dump restore command. Example:
      maprcli volume dump restore -name volume@cluster -dumpfile fulldump1
    2. Copy the files to a standard (non-mirror) volume.

Related Topics

  • No labels