Quick Notes

Things that came on the way

Determine to Stay Serial or Go Parallel

Given that concurrent programming involves unique challenges like race conditions and deadlocks, it would be advantageous to determine whether concurrent processing of a problem provides better performance than serial processing. Speedup is the metric which can be used to make this comparison. This involves identifying the best solutions to solve the problem through serial processing and concurrent processing.

Concurrent Parallel or Distributed Processing

Reading through literature and books on concurrency and programming there are instances where terminologies are used interchangeably which sometimes cause confusion. Here are some of the terms and how I understand/use them

Concurrency and Why Application Developers Need to Know About It

“Concurrent” in simple terms is defined as more than one thing happening or done at the same time. Working on a document and at the same time communicating with a colleague through a messaging system is an example of concurrency. But how does it relate to computers and programming? In its simple model, computers have interfaces like key board, mouse, touch screen, display unit for users to interact with, network interface to connect with other computers, storage to store and retrieve data and the hardware which includes the CPU to do processing. In order to manage the interactions with the various components, computers need to do multiple things at the same time. All the simultaneous interactions with devices and the users of computers are managed by operating systems. As any OS or device driver or system call developer can attest, concurrency is a challenge they face quite often.

Cassandra - a Decentralized Structured Storage System

Cassandra had the following as the basic set of requirements

  • Can run on commodity hardware and handle very high write throughput
  • Can scale with increase in the number of users
  • Should be able to replicate across geographical distributed data centers for fail over

Ceph

Wide variety of applications in high performance computing faced issues with scalability of file systems like NFS (Network File Systems) which led to the adoption of distributed file systems based on object based storage architecture. Object storage improved the scalability by separating the file system meta operations and the data storage operations. The meta operations areĀ  handled by metadata servers (MDS) while the traditional block level interface was replaced by object storage devices (OSD). Clients interact with MDS to perform meta operations like open and then directly interact with OSD to store the data which in-turn delegates the data placement to devices themselves. Reliance and operations done by MDS and not having enough intelligence in OSD constrained the scalability and CEPH tried to mitigate it.