In a distributed cluster like HBase, logs are created and written on individual nodes participating in the cluster. If users want to look at the log entries say for debugging purposes they would need to logon to individual nodes. Apart from the complexity of going through the logs on the individual nodes, users need to be provided access to all the nodes. One of the ways to mitigate these concerns is to copy the log files into a centralized location.
Apache Flume can be used to copy log files from individual nodes into HDFS. The following is an example Flume configuration which can be used to start Flume agents to copy HBase region server log files into HDFS. Note that changes will be required to this sample configuration based on cluster set-up.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
A daily log directory with the name
yy-mm-dd will be created under
/user/flume directory in HDFS. Log data from each individual nodes where Flume agent is run will be copied into individual files under the daily directory. By providing read access to the centralized HDFS directory in this case
/user/flume, users will be able to go through the logs from all the nodes. Since each days logs are stored in seperate directories, users will be able to navigate easily.
Inorder the Flume agent to copy the log data successfully, HDFS client components including
hdfs-site.xml need to be installed on the individual nodes apart from Flume itself. In the case of HBase, HDFS components should be already available on the nodes since it is a pre-requisite.