Finding Race Conditions in C++

The use of multithreaded techniques in complex applications can help to both simplify code and improve responsiveness, but there are many traps for the unwary. One common pitfall is race conditions.

Problem areas can often be subtle and will only “bite” seemingly randomly.  Even if the issue seems very unlikely to happen, the issue WILL hurt you at some point, especially if the software is deployed & executed on a large scale.

As well as using suitable software design patterns during development, I make use of two clang-based tools to help find problems.

This article shows a simple example of a race condition and explains how to use either of two different tools to find the issues.

Example program

The following is a short sample program that shows a deliberate race condition. It is written in C++, and uses C++11 for basic threading functionality – to create new threads, and to wait for their completion.

In this example, class Thing has a method called Add().  This method simply adds the supplied integer parameter to the m_N member.  But rather than using a nice simple addition, it does its work in a roundabout way which is prone to race conditions.  If the method is called by just one thread at a time, it works as it should, but If two or more threads are calling that same method at the same time then the results can be erratic.

Obviously this particular example can be fixed by just using atomic addition, but this example is trying to illustrate an approach to finding race conditions.

#include <chrono>
#include <cstdlib>
#include <iostream>
#include <thread>

class Thing
{
public:
  Thing() : m_N(0) {}
  ~Thing() { std::cout << "Result=" << m_N << std::endl; }

  void Add(int i)
  {
    auto temp = m_N;
    m_N = 44;
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
    temp += i;
    m_N = temp;
  }

private:
  int m_N;
};

void munge(Thing& t)
{
  for (int i = 0; i < 100; i++)
  {
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
    t.Add(1);
  }
}

int main()
{
  // Instantiate a single Thing instance which will be modified at the same
  // time by three threads
  Thing thing;

  // Start the three threads to hammer our single Thing instance
  std::thread thread1(munge, std::ref(thing));
  std::thread thread2(munge, std::ref(thing));
  std::thread thread3(munge, std::ref(thing));

  // Wait for them all to complete
  thread1.join(); thread2.join(); thread3.join();

  return EXIT_SUCCESS;
}

The usual approach to fixing this kind of issue is to add a new private member to Thing of type std::mutex, and change method Add() to lock this mutex on entry.  This makes sure that only one thread at a time can be inside that method.  Other threads will block until the method is exited and the mutex is unlocked.

Making those changes will look like this:

#include <chrono>
#include <cstdlib>
#include <iostream>
#include <mutex>
#include <thread>

class Thing
{
public:
  Thing() : m_N(0) {}
  ~Thing() { std::cout << "Result=" << m_N << std::endl; }

  void Add(int i)
  {
    std::lock_guard<std::mutex> lock(m_Mutex);
    auto temp = m_N;
    m_N = 44;
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
    temp += i;
    m_N = temp;
  }

private:
  mutable std::mutex m_Mutex;
  int m_N;
};

void munge(Thing& t)
{
  for (int i = 0; i < 100; i++)
  {
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
    t.Add(1);
  }
}

int main()
{
  // Instantiate a single Thing instance which will be modified at the same
  // time by three threads
  Thing thing;

  // Start the three threads to hammer our single Thing instance
  std::thread thread1(munge, std::ref(thing));
  std::thread thread2(munge, std::ref(thing));
  std::thread thread3(munge, std::ref(thing));

  // Wait for them all to complete
  thread1.join(); thread2.join(); thread3.join();

  // The destructor for 'thing' will show the result.  The correct result should
  // be 300 each time, but if we have race conditions then this will vary sometimes.

  return EXIT_SUCCESS;
}

This example is now thread-safe, and should always produce the same results.

But what we really need is a tool to help find issues such as the lack of mutex protection, in case we accidentally miss it when developing the code.  We could then make these tools part of our development process so that race conditions get caught quickly.

Using clang we have two options; one uses static analysis (at compile time) and the other uses dynamic analysis (at run time).

Clang Thread-Safety Analysis

To use clang’s thread-safety analysis, you’ll need to use a recent version of clang.

clang’s thread-safety analysis performs static analysis with code annotations.  So the idea is that you annotate your code in a way that tells this tool the intended thread safety semantics using a set of macros.  The static checker will validate these assertions at compile time.  The annotations serve another benefit, that of improving the understandability of the code (making assumptions explicit).

My example code was written on Debian Linux, and assumes that I’ve got these packages installed:

sudo apt install build-essential cmake clang

Also, I’m building against libc++; this isn’t mandatory, but one advantage is that it already has annotations on relevant primitives (std::thread etc).  Without that, you’ll need to use wrappers which I’ll briefly discuss a bit later.

To install libc++ you need to add these packages:

sudo apt install libc++-dev libc++abi-dev

To use this tool on an existing codebase we need threading primitives (e.g. std::mutex and friends) with thread-safety annotations.  There’s a two approaches to do this:

  • Use libc++ which already has these (make sure you define the _LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS preprocessor symbol).  Or,
  • Define wrappers around std::mutex (etc) which have annotations, and modify your code to use those wrappers.  This is one example but it needs to be filled-in with implementations of some methods.  Or https://github.com/google/benchmark/blob/master/src/mutex.h but that one needs some modifications to remove testing macros.

Or use the following example which I made based on a combination of the above:

#ifndef MUTEX_WRAPPERS_H_
#define MUTEX_WRAPPERS_H_

#include <mutex>

#include "thread_safety.h"

namespace example {

// NOTE: Wrappers for std::mutex and std::unique_lock are provided so that
// we can annotate them with thread safety attributes and use the
// -Wthread-safety warning with clang. The standard library types cannot be
// used directly because they do not provided the required annotations.
class CAPABILITY("mutex") Mutex {
 public:
  Mutex() {}

  void lock() ACQUIRE() { mut_.lock(); }
  void unlock() RELEASE() { mut_.unlock(); }
  std::mutex& native_handle() { return mut_; }

 private:
  std::mutex mut_;
};

class SCOPED_CAPABILITY MutexLock {
  typedef std::unique_lock<std::mutex> MutexLockImp;

 public:
  MutexLock(Mutex& m) ACQUIRE(m) : ml_(m.native_handle()) {}
  ~MutexLock() RELEASE() {}
  MutexLockImp& native_handle() { return ml_; }

 private:
  MutexLockImp ml_;
};

}

#endif // MUTEX_WRAPPERS_H_

To use this you need to #include it from your source and then modify your source to use example::Mutex instead of std::mutex, etc.  Using wrappers is a flexible approach but involves modifying the code under test which is why it’s not ideal, it’s better to use libc++ if possible.

You’ll also need to define the various needed macros that the annotations use.  These are documented at https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#mutex-h but in that form they’re a bit messy to use, I’ve collated them into a single header file below which I save as thread_safety.h

#ifndef THREAD_SAFETY_H_
#define THREAD_SAFETY_H_

// Enable thread safety attributes only if HAVE_THREAD_SAFETY_ATTRIBUTES is defined.
#if defined(HAVE_THREAD_SAFETY_ATTRIBUTES)
#define THREAD_ANNOTATION_ATTRIBUTE__(x) __attribute__((x))
#else
#define THREAD_ANNOTATION_ATTRIBUTE__(x)  // no-op
#endif

#define CAPABILITY(x) \
  THREAD_ANNOTATION_ATTRIBUTE__(capability(x))

#define SCOPED_CAPABILITY \
  THREAD_ANNOTATION_ATTRIBUTE__(scoped_lockable)

#define GUARDED_BY(x) \
  THREAD_ANNOTATION_ATTRIBUTE__(guarded_by(x))

#define PT_GUARDED_BY(x) \
  THREAD_ANNOTATION_ATTRIBUTE__(pt_guarded_by(x))

#define ACQUIRED_BEFORE(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(acquired_before(__VA_ARGS__))

#define ACQUIRED_AFTER(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(acquired_after(__VA_ARGS__))

#define REQUIRES(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(requires_capability(__VA_ARGS__))

#define REQUIRES_SHARED(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(requires_shared_capability(__VA_ARGS__))

#define ACQUIRE(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(acquire_capability(__VA_ARGS__))

#define ACQUIRE_SHARED(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(acquire_shared_capability(__VA_ARGS__))

#define RELEASE(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(release_capability(__VA_ARGS__))

#define RELEASE_SHARED(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(release_shared_capability(__VA_ARGS__))

#define TRY_ACQUIRE(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(try_acquire_capability(__VA_ARGS__))

#define TRY_ACQUIRE_SHARED(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(try_acquire_shared_capability(__VA_ARGS__))

#define EXCLUDES(...) \
  THREAD_ANNOTATION_ATTRIBUTE__(locks_excluded(__VA_ARGS__))

#define ASSERT_CAPABILITY(x) \
  THREAD_ANNOTATION_ATTRIBUTE__(assert_capability(x))

#define ASSERT_SHARED_CAPABILITY(x) \
  THREAD_ANNOTATION_ATTRIBUTE__(assert_shared_capability(x))

#define RETURN_CAPABILITY(x) \
  THREAD_ANNOTATION_ATTRIBUTE__(lock_returned(x))

#define NO_THREAD_SAFETY_ANALYSIS \
  THREAD_ANNOTATION_ATTRIBUTE__(no_thread_safety_analysis)

#endif // THREAD_SAFETY_H_

Once you’ve got those two header files in place, you can annotate your code, the example program then looks like this (I’ve highlighted the lines of interest):

#include <chrono>
#include <cstdlib>
#include <iostream>
#include <mutex>
#include <thread>

// This defines macros such as GUARDED_BY
#include "thread_safety.h"

class Thing
{
public:
  Thing() : m_N(0) {}
  ~Thing() { std::cout << "Result=" << m_N << std::endl; }

  void Add(int i)
  {
//  std::lock_guard<std::mutex> lock(m_Mutex);
    auto temp = m_N;
    m_N = 44;
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
    temp += i;
    m_N = temp;
  }

private:
  mutable std::mutex m_Mutex;
  // Tell the compiler that m_N needs to be guarded by the above mutex.
  int m_N GUARDED_BY(m_Mutex);
};

void munge(Thing& t)
{
  for (int i = 0; i < 100; i++)
  {
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
    t.Add(1);
  }
}

int main()
{
  // Instantiate a single Thing instance which will be modified at the same
  // time by three threads
  Thing thing;

  // Start the three threads to hammer our single Thing instance
  std::thread thread1(munge, std::ref(thing));
  std::thread thread2(munge, std::ref(thing));
  std::thread thread3(munge, std::ref(thing));

  // Wait for them all to complete
  thread1.join(); thread2.join(); thread3.join();

  // The destructor for 'thing' will show the result.  The correct result should
  // be 300 each time, but if we have race conditions then this will vary sometimes.

  return EXIT_SUCCESS;
}

You then need to add a “-Wthread-safety” option to the compiler for each affected module, and that then will give you warnings at compile time if you miss any locks etc.

For example, if we annotate the above example code but forget to lock the mutex, the warnings will look something like this:

clang++ -DHAVE_THREAD_SAFETY_ATTRIBUTES -D_LIBCPP_HAS_THREAD_SAFETY_ANNOTATIONS -stdlib=libc++ -Wall -Wextra -Wthread-safety -pthread -std=gnu++11 -o CMakeFiles/test.dir/test1.cpp.o -c /home/user1/thread/test1.cpp
/home/user1/thread/test1.cpp:19:17: warning: reading variable 'm_N' requires holding mutex 'm_Mutex'
      [-Wthread-safety-analysis]
    auto temp = m_N;
                ^
/home/user1/thread/test1.cpp:20:5: warning: writing variable 'm_N' requires holding mutex 'm_Mutex' exclusively
      [-Wthread-safety-analysis]
    m_N = 44;
    ^
/home/user1/thread/test1.cpp:23:5: warning: writing variable 'm_N' requires holding mutex 'm_Mutex' exclusively
      [-Wthread-safety-analysis]
    m_N = temp;
    ^
3 warnings generated.

In this example, the compiler is warning that at line 19 we’re reading m_N without the correct mutex being locked, and at lines 20 and 23 we’re writing m_N without the lock.

A limitation of this tool is that it can’t check std::unique_lock because it has optional ‘deferred’ semantics which are difficult to handle with a static checker.  Most normal mutex locks should be using std::lock_guard anyway, but if you do rely on std::unique_lock this may be an issue for you with this tool.

To summarise, to use thread-safety analysis, you’ll need to:

  • Annotate your code to tell the compiler about the threading aspects of your code,
  • Either use libc++ or add a set of wrappers to your build,
  • Add the “-Wthread-safety” option to the compilation flags, and
  • Compile the code, taking of any warnings generated by the static analysis process.

Clang Thread Sanitiser

An alternative approach to the above is a run-time analysis.  I used to use Valgrind/Helgrind for this, but that is quite a “heavy” approach, and it also causes more false-positives than I would like.

More recently I’ve been using clang’s Thread Sanitiser.  This tool was initially developed by Google who have contributed it to both GCC and clang.  I’m describing it here in clang but it’s a similar exercise to use it with GCC.

Unlike with the static analysis approach described above, there’s no need to annotate the code under test, which is an advantage when trying to fix a legacy code base.  It’s best to understand as many of the legacy issues as possible before trying to fix anything.

To use Thread Sanitiser, it’s a simple matter of adding “-fsanitize=thread” to both the compiler command line as well as the linker command line, and then running the code in question.  When the program terminates, you’ll then get a list of observed issues, like this:

user1@debian:~/build$ ./test 
Testing...
==================
WARNING: ThreadSanitizer: data race (pid=3380)
  Read of size 4 at 0x7ffdc220a030 by thread T2:
    #0 Thing::Add(int) <null> (test+0x4b32c3)
    #1 munge(Thing&) <null> (test+0x4b2e3b)
    #2 decltype(std::__1::forward<void (*)(Thing&)>(fp)(std::__1::forward<std::__1::reference_wrapper<Thing> >(fp0))) std::__1::__invoke<void (*)(Thing&), std::__1::reference_wrapper<Thing> >(void (*&&)(Thing&), std::__1::reference_wrapper<Thing>&&) <null> (test+0x4b68f5)
    #3 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing>, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing> >&, std::__1::__tuple_indices<2ul>) <null> (test+0x4b67b1)
    #4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing> > >(void*) <null> (test+0x4b5d39)

  Previous write of size 4 at 0x7ffdc220a030 by thread T1:
    #0 Thing::Add(int) <null> (test+0x4b32d6)
    #1 munge(Thing&) <null> (test+0x4b2e3b)
    #2 decltype(std::__1::forward<void (*)(Thing&)>(fp)(std::__1::forward<std::__1::reference_wrapper<Thing> >(fp0))) std::__1::__invoke<void (*)(Thing&), std::__1::reference_wrapper<Thing> >(void (*&&)(Thing&), std::__1::reference_wrapper<Thing>&&) <null> (test+0x4b68f5)
    #3 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing>, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing> >&, std::__1::__tuple_indices<2ul>) <null> (test+0x4b67b1)
    #4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing> > >(void*) <null> (test+0x4b5d39)

  As if synchronized via sleep:
    #0 nanosleep <null> (test+0x425557)
    #1 std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) <null> (libc++.so.1+0x900ca)
    #2 munge(Thing&) <null> (test+0x4b2e28)
    #3 decltype(std::__1::forward<void (*)(Thing&)>(fp)(std::__1::forward<std::__1::reference_wrapper<Thing> >(fp0))) std::__1::__invoke<void (*)(Thing&), std::__1::reference_wrapper<Thing> >(void (*&&)(Thing&), std::__1::reference_wrapper<Thing>&&) <null> (test+0x4b68f5)
    #4 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing>, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing> >&, std::__1::__tuple_indices<2ul>) <null> (test+0x4b67b1)
    #5 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Thing&), std::__1::reference_wrapper<Thing> > >(void*) <null> (test+0x4b5d39)

  Location is stack of main thread.

  Location is global '??' at 0x7ffdc21eb000 ([stack]+0x00000001f030)

  Thread T2 (tid=3383, running) created by main thread at:
    #0 pthread_create <null> (test+0x4278e5)
    #1 std::__1::__libcpp_thread_create(unsigned long*, void* (*)(void*), void*) <null> (test+0x4b5c9c)
    #2 std::__1::thread::thread<void (&)(Thing&), std::__1::reference_wrapper<Thing>, void>(void (&)(Thing&), std::__1::reference_wrapper<Thing>&&) <null> (test+0x4b373a)
    #3 main <null> (test+0x4b2f25)

  Thread T1 (tid=3382, running) created by main thread at:
    #0 pthread_create <null> (test+0x4278e5)
    #1 std::__1::__libcpp_thread_create(unsigned long*, void* (*)(void*), void*) <null> (test+0x4b5c9c)
    #2 std::__1::thread::thread<void (&)(Thing&), std::__1::reference_wrapper<Thing>, void>(void (&)(Thing&), std::__1::reference_wrapper<Thing>&&) <null> (test+0x4b373a)
    #3 main <null> (test+0x4b2f01)

SUMMARY: ThreadSanitizer: data race (/home/user1/build/test+0x4b32c3) in Thing::Add(int)
==================
Result=47
ThreadSanitizer: reported 1 warnings

Which tells us where the race condition was detected, which threads were involved, and what data was affected.  It’s not as concise as the output from static analysis, but it’s still valuable.

A downside of this approach is that it will only detect problems that were actually executed; if a problematic code path is not executed in the test run, then the sanitiser will not detect problems in that code path.

Of course, there’s no reason not to use static analysis as well as runtime analysis, the more checks the better!