Connection Object Reuse
Creating connections to a server component from an application is a heavy weight operation and it is much pronounced when connecting to a database server. That being the reason database connection pooling is used to reuse connection objects and HBase is no exception. In HBase, data from meta table that stores details about region servers that can serve data for specific key ranges gets cached at the individual connection level that makes HBase connections much heavier. So if there are region movements for balancing or if a region server fails, the meta data need to be refreshed for each connection object which is a performance overhead. For these reasons, applications need to try to reuse connection objects created.
The following code snippet shows how to create a HBase connection object in a Java application using HBase.
1 2 3
If the application is multi-threaded, then it need to reuse the connection object to perform any data manipulation operations on tables. This can be achieved by individual threads creating the HTable object using the getTable(TableName) method of the HConnection object. Once the data manipulation operations are complete each thread should close corresponding HTable but not the HConnection object so that it can be reused by other threads.
In order to prevent skews in processing of queries and to distribute query processing work load across all the nodes in the cluster, it is a good practice to create tables which is pre-split. The key is to identify the split point so that the data will be distributed across all the nodes in the cluster. Once the split point is identified the table can be created pre-split using HBase shell and the following is an example of a table with 3 split points.
1 2 3
During start of development, when the split points in the data are not clear but if some one still want to pre-split the table, HBase provides a utility program which can split the table and uniformly distribute the data. The following is an example which creates a table with 10 splits and columnfamily ‘cf1’.
If you are creating tables programmatically using Java APIs, the following code snippet shows how to pre-split the table during creation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
For further reading and understanding the details about HBase table splitting and merging refer this blog post.