HBase uses block cache to store data read from disks in memory so that data referenced repeatedly are serviced with out disk reads. Block cache uses the HBase JVM heap to store cache data and that means any factor which adversely impacts JVM GC process will impact cache and hence query performance.It is commonly known that large heap sizes of say more than 16 GB will make “stop the world” instances of GC run very slow and the time taken to complete can even make the HBase region server to be marked as dead.
Given that physical servers currently deployed has large memories like 128 GB or 256 GB, the 16 GB JVM heap limitation will result in inefficient server utilization. Apart from inefficient utilization, the ability to utilize the large memory to store data in HBase cache which will considerably improve query performance will be missed.
To mitigate this drawback, HBase includes another in-memory data structure called BucketCache. Unlike block cache which utilizes the JVM heap, bucket cache utilizes offheap memory created using Java ByteBuffer
. Bucket cache is not enabled by default but once enabled large physical memory can be allocated to bucket cache since it is not impacted by JVM GC as with block cache. By being able to allocate large memory to HBase for caching, large volume of data can be stored in memory which will improve query latencies. In tests we were able to allocate BucketCache of upto 91 GB and data loaded into it with no issues.
When bucket cache is enabled in HBase, it will act as L2 (level 2 similar to processor cache hierarchy) cache and the default block cache will act as L1 cache. What that means is data need to be copied from L2 cache into L1 cache for HBase to service the query request. So on a read request when bucket cache is enabled, data block will be copied from disk onto bucket cache and then onto the block cache before served to the client making the request. On further reads of the same data, it will either be served directly from block cache or copied from bucket cache into block cache before serving the client request avoiding disk reads. So if there is a 10 node cluster with 128 GB of RAM each, it is possible to have almost 1 TB of data stored in HBase cache with bucket cache enabled and not get impacted by JVM GC which is not the case using just the default block cache.
Inorder to enable bucket cache, changes need to be made to HBase JVM parameters in hbase-env.sh
file and new HBase properties need to be set in hbase-site.xml
file. Following summarizes the properties and the calculation on how to set the properties.
Let us assume the following
Memory type | Mnemonic |
---|---|
Physical memory available for HBase RS | Tot |
Memory size for memstore | Msz |
Memory for block cache (L1) cache | L1Sz |
Memory for JVM related components | JHSz |
The following are the properties and the values to be set on each HBase region servers.
Location | Name | Mnemonic | Value |
---|---|---|---|
hbase-env.sh | XX:MaxDirectMemorySize | DMem | Tot-Msz-L1Sz-JHSz |
hbase-env.sh | Xmx | Xmx | MSz+L1Sz+JHSz |
hbase-site.xml | hbase.regionserver.global.memstore.upperLimit | ULim | MSz/Xmx |
hbase-site.xml | hfile.block.cache.size | blksz | 0.8-ULim |
hbase-site.xml | hbase.bucketcache.size | bucsz | DMem+(blksz*Xmx) |
hbase-site.xml | hbase.bucketcache.percentage.in.combinedcache | ccsz | 1-((blksz*Xmx)/bucsz)) |
hbase-site.xml | hbase.bucketcache.ioengine | offheap or file:/localfile |
The following is an example
Item | Value |
---|---|
Physical memory available for HBase RS | 96000 MB |
Memory size for memstore | 2000 MB |
Memory for block cache (L1) cache | 2000 MB |
Memory for JVM related components | 1000 MB |
hbase.regionserver.global.memstore.upperLimit | 91000 |
Xmx | 5000 |
hbase.regionserver.global.memstore.upperLimit | 0.4 |
hfile.block.cache.size | 0.4 |
hbase.bucketcache.size | 93000 |
hbase.bucketcache.percentage.in.combinedcache | 0.97849 |
hbase.bucketcache.ioengine | offheap |
Apart from acting as L1 cache the block cache should be able to store block indexes and bloom filters and so should be sized accordingly. The size of block index and bloom filters can be easily identified by looking at the HBase store file details displayed on the HBase master URL. Changes mentioned need to be made to all the HBase region server nodes for bucket cache to be enabled and all the region server nodes need to be restarted for the changes to take in effect. To ease these calculations use the simple python script available here.
If using more than 22 GB of offheap memory, it is better to mount the RAM as a tempfs and use the file:/localfile
value for the hbase.bucketcache.ioengine
hbase-site.xml property. This is due to the existing open issue HBASE-10643.
More notes on this category can be found here.
For any one interested in visuals, the following presentation may help.