Quick Notes

Things that came on the way

Amazon Aurora Storage

Two fundamental concepts enables Amazon Aurora help meet requirements that need to be satisfied by any cloud based database like seamless scalability, high availability, fault tolerance, quick recovery without compromising on performance or increase in maintenance effort.

  • Monotonically increasing Log Sequence Number (LSN) attached to each log record which is written for changes
  • A multi tenant distributed storage system built for databases to which multiple database instances can be attached. The storage system performs the persistence functions of a traditional database like writing logs to disk, creating and persisting data pages i.e. the custom storage system understands log records and data pages. Also the storage system makes it possible for Aurora to segregate the compute components of databases namely the SQL layer, transaction management and caching from the storage layer

Active DBMS

Passive database management systems (DBMS) are program driven i.e. users query the current state of database and retrieve the information currently available in the database. An active database is one which automatically executes user specified actions when specified condition arise. The first paper details an architecture for an active database using Event-Condition-Action (ECA) rules as a formalism for active database capabilities. The second paper details an architecture of transforming a passive DBMS to an active DBMS.

The RUM Conjecture

Data access methods need to modified or newly invented to adapt with ever changing workload requirements and hardware changes. This paper looks at the challenges in designing new access methods which increasingly needs to be application and hardware aware. The fundamental challenges faced are to minimize a) Read time - R b) Update cost - U c) memory over head - M and the conjecture made is that when optimizing the read-update-memory (RUM) overheads, optimizing in any two negatively impacts the third. Deciding which overheads to optimize for and to what extend has always been and remains the prominent part of designing access methods.

Amazon Dynamo

Requirements Dynamo tries to satisfy

  • Data read and written are identified uniquely by a key
  • Data size is small and stored as raw bytes that doesn’t require a relational schema
  • Queries doesn’t span multiple data items i.e. user queries deal with only one row at a time
  • Use cases that can tolerate weaker consistency for high availability and require no isolation guarantees
  • Can be deployed on commodity hardware in a trusted environment that doesn’t require authentication or authorization

JDBC Connection to Apache Phoenix

Phoenix provides a JDBC driver for Java client and hence can be connected to Phoenix by following the steps required to get a JDBC connection. As with JDBC drivers for other DBMS, there are are some Phoenix specific requirements to get a JDBC connection. For a non secure HBase cluster the Phoenix JDBC connection string should be of the form jdbc:phoenix:<ZK-QUORUM>:<ZK-PORT>:<ZK-HBASE-NODE>. The following is the code snippet to get a Phoenix JDBC connection object for a non secure HBase cluster.