Quick Notes

Things that came on the way

Cassandra - a Decentralized Structured Storage System

Cassandra had the following as the basic set of requirements

  • Can run on commodity hardware and handle very high write throughput
  • Can scale with increase in the number of users
  • Should be able to replicate across geographical distributed data centers for fail over

Ceph

Wide variety of applications in high performance computing faced issues with scalability of file systems like NFS (Network File Systems) which led to the adoption of distributed file systems based on object based storage architecture. Object storage improved the scalability by separating the file system meta operations and the data storage operations. The meta operations are  handled by metadata servers (MDS) while the traditional block level interface was replaced by object storage devices (OSD). Clients interact with MDS to perform meta operations like open and then directly interact with OSD to store the data which in-turn delegates the data placement to devices themselves. Reliance and operations done by MDS and not having enough intelligence in OSD constrained the scalability and CEPH tried to mitigate it.

Google Bigtable

Bigtable is Google’s distributed key value data storage system which can scale thousands of machines storing petabytes of data.

  • Provides a simple data model that supports dynamic control over data layouts and formats unlike relational data model
  • Allows clients to reason about the locality properties of the data represented in the underlying storage
  • Data in indexed by row and column names that can be arbitrary strings

Google File System

GFS is Google’s distributed filesystem to support their processing needs based on the following observations

Observations

  • Component failures are the norm rather than the exception. Therefore monitoring, error detection, fault tolerance and recovery should be integral to the system
  • Files are large by traditional standards and grows faster.
  • Files are mutated by append operation rather than overwriting existing data

Organizational Knowledge Management

With increased opportunity for mobility comes the higher risk of losing knowledge. Knowledge which is detrimental to the success of an organization, mitigating the risk of losing it is vital.

Knowledge is broadly categorized as explicit and tacit. Explicit knowledge is something which can be recorded, for e.g. the process of multiplying numbers and multiplication tables. A person learns from explicit knowledge and assimilates it through reflective cognition. Then by repeatedly applying the knowledge they gain the expertise and it becomes part of their reflective cognition and in turn tacit.