Tales of the Tail

Spikes in tail latency is a common challenge faced especially in large scale, parallel and interactive applications and this paper looks at the sources of this spike at the hardware, os and application layers. The study is done by performing tests and collecting fine grained measurements on three servers a custom null RPC service, Memchached and Ngnix on Linux. The measurements are compared against best achievable latency distribution by modeling these services as a queueing system. This comparison identifies the major sources of tail latency beyond that caused by workload bursts namely

Interference from other processes including background processes on a seemingly dedicated machine
Request re-ordering caused by scheduling policies that are not designed with tail latency in mind
Application design choices involving how transport connections are bound to processes or threads
Multi core issues such as how NIC interrupts and server processes are mapped to cores
CPU power saving mechanisms

and quantify them.

Summary of the findings

The following table provides the summary of findings on the cause of the spike in tail latency due to various factors and how they can be mitigated

Reference

Hardware, OS, and Application-level Sources of Tail Latency

Quick Notes

Things that came on the way

Summary of the findings

Reference

Comments