Quick Notes

Things that came on the way

Amazon Aurora Storage

Two fundamental concepts enables Amazon Aurora help meet requirements that need to be satisfied by any cloud based database like seamless scalability, high availability, fault tolerance, quick recovery without compromising on performance or increase in maintenance effort.

  • Monotonically increasing Log Sequence Number (LSN) attached to each log record which is written for changes
  • A multi tenant distributed storage system built for databases to which multiple database instances can be attached. The storage system performs the persistence functions of a traditional database like writing logs to disk, creating and persisting data pages i.e. the custom storage system understands log records and data pages. Also the storage system makes it possible for Aurora to segregate the compute components of databases namely the SQL layer, transaction management and caching from the storage layer

Amazon Dynamo

Requirements Dynamo tries to satisfy

  • Data read and written are identified uniquely by a key
  • Data size is small and stored as raw bytes that doesn’t require a relational schema
  • Queries doesn’t span multiple data items i.e. user queries deal with only one row at a time
  • Use cases that can tolerate weaker consistency for high availability and require no isolation guarantees
  • Can be deployed on commodity hardware in a trusted environment that doesn’t require authentication or authorization

JDBC Connection to Apache Phoenix

Phoenix provides a JDBC driver for Java client and hence can be connected to Phoenix by following the steps required to get a JDBC connection. As with JDBC drivers for other DBMS, there are are some Phoenix specific requirements to get a JDBC connection. For a non secure HBase cluster the Phoenix JDBC connection string should be of the form jdbc:phoenix:<ZK-QUORUM>:<ZK-PORT>:<ZK-HBASE-NODE>. The following is the code snippet to get a Phoenix JDBC connection object for a non secure HBase cluster.

A Note on Distributed Computing

A distributed system is a collection of independent computers that appears to its users as a single coherent system. This paper argues that the objects in a distributed object oriented system form a single ontological class where all entities can be described by the specification of the set of interfaces of the objects and the semantics of operation is mistaken. This vision of unified objects for distributed systems is centered around the principles that