HBase supports inter-cluster data replication which can be used to propagate data to a secondary cluster/data center that can be accessed when primary cluster/data center is not available. The following are the high level steps to enable HBase inter-cluster replication. Note that HBase also supports region replication with in a cluster for read HA which is different from inter-cluster data replication.
Set the hbase.replication property to true in hbase-site.xml of the HBase cluster from which data need to be replicated from. This cluster is referred as the master going forward. By default the value of this property is “true”.
Create a HBase replication peer in the master HBase cluster using the information about the ZooKeeper quorum of the cluster to which data need to be replicated to. The cluster to which data will be replicated to will be referred as slave going forward.
$ hbase shell
15/09/23 10:35:52 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
0 row(s) in 0.3470 seconds
PEER_ID CLUSTER_KEY STATE TABLE_CFS
HBASE_REPL_PEER zk1,zk2,zk3:2181:/hbase ENABLED
1 row(s) in 0.1520 seconds
Once the replication peer is created and enabled, replication need to be enabled on HBase tables whose data need to be replicated from the master cluster by setting the “REPLICATION_SCOPE” attribute of the table to a non zero value. By default this value is set to “0”. If the table is an existing table, altering the table to set the “REPLICATION_SCOPE” to a non zero value requires disabling and enabling the table and the following is an example of the steps where the existing table’s name is “healthy”. Note that a table with the same definition as the table being replicated (in this case “healthy”) should be created in the slave cluster before the replication is enabled on master.