Quick Notes

Things that came on the way

Java Direct ByteBuffer Performance Advantages and Considerations

During execution, objects/variables created by Java programs gets their space allocated in the JVM heap memory. The total amount of heap memory available for a JVM is determined by the value set to -Xmx parameter when starting the Java process. When object allocated is released by the Java program, the corresponding memory is made available for later use by the JVM garbage collection (GC) process.

The GC process gets invoked typically when the amount of free memory in the JVM falls below a certain threshold. At a very high level, the GC process involves identification of objects which are not used any more i.e. not referenced anymore, releasing the memory and compacting the memory to reduce memory fragmentation. Readers who are interested in understanding the details of GC process can find it here. As one can imagine, the time it takes to complete the GC process will increase with the increase in size of the Java heap memory since it takes more time to identify the objects which can be released and also to perform compaction.

Secure All Applications Please

When you work with enterprises often you get to see batch applications storing credentials to login to systems like databases or messaging infrastructure or other enterprise applications in config files as plain text. Also these batch applications don’t get the same attention as customer facing applications when it comes to security. If you have similar application configurations and the thought is that these batch applications are behind the firewall in a DMZ and hence pose less risk, think again. As anyone who work in computer forensics/security can attest, most often data breach is perpetrated by an insider and these instances never get reported or get media attention. If you are looking for numbers here is a summary of 2012 security incident report from Forrester.

To de-risk scenarios like these, the solution doesn’t have to be too complex. It can be a matter of following a simple process similar to the following across the enterprise,

Chef HWRP Using an Example

Heavy Weight Resource Provider (HWRP) is one of the options Chef offers to create custom resources and the other being LWRP. It would be good to read the notes on LWRP to understand the context and the difference between LWRP and HWRP.

Similar to LWRP, HWRP requires a resource definition and the corresponding provider. The key difference is that there are no DSL in the HWRP as in LWRP and everything is coded in Ruby code. So taking the same example of HDFS directory resource used in the notes on LWRP, the following is the skeleton of the resource definition.

Chef LWRP Using HDFS Directory as an Example

Chef provides a large set of resources to work with. But there are situations where resources provided by Chef may not be sufficient. For e.g, distributed file systems can’t be handled by the file system related resources (file, directory etc) which comes out of the box with Chef. Being flexible and customizable, Chef provides two options (LWRP, HWRP) for users to create their own resources.

Integrating Chef and Apache ZooKeeper for Coordination in a Cluster

In a cluster environment services on nodes may have to be coordinated for various reasons. For e.g., when a configuration change is made to a distributed computing component like HDFS, the HDFS service on all nodes shouldn’t stop at the same time to restart so that the configuration takes in effect. Stopping of the service on all the nodes will end up in unavailability which is not desired to put it lightly.

There are many options to perform orchestration/coordination with varied maturity when you manage a cluster using Chef. Here we look at how Chef and ZooKeeper can work together to perform coordination of services on cluster nodes. We will use the need to control and coordinate service restart so that the service in all nodes are not stopped at the same time as the example to explain the solution.