This talk takes the “ultimately practical” approach to concurrent programming, with a focus on
lock-free programs: after all, in reality such programs are almost always written in the hopes of getting better performance. We’re going to measure performance of the individual concurrent primitives and their effect on the overall performance of the whole program.
The goal of the talk is two-fold. On one hand, I will show a set of tools and practices that can be used to get quantitative measurements of the performance of different implementations under various load conditions. Mastering these techniques will allow the attendees to choose their concurrent algorithms and implementations based on solid data instead of guesswork or “common knowledge” (which is often wrong or outdated). On the other hand, even with the focus on real-life applications we can learn a few things about the fundamental nature of concurrent programs. This understanding comes especially useful when dealing with the “common knowledge” and “simple logic”. For example, it’s “common knowledge” that lock-free programs are faster than lock-based (not always). It’s also a “simple logic” that the hardware must protect shared memory access in a multi-core system, so ultimately locking is always present (sometimes true, sometimes true but misleading, and sometimes false). It is both “common knowledge” and “simple logic” that a wait-free program does not wait (but if your definition of wait is “will I have to wait for the results after I finish my coffee?” then it definitely does).
We will explore practical examples of (mostly) lock-free data structures, with actual implementations and performance measurements. Even if the specific limitations and simplifying assumptions used in this talk do not apply to your problem, the key point to take away is how to find such assumptions and take advantage of them in your specific application: after all, in reality it’s almost always about performance.