C++ Concurrency and Synchronization

If you think it’s simple, then you have misunderstood the problem

- Bjarne Stroustrup

Introduction

Concurrency (and closely related parallelism) is about doing more than one thing at once. In code, that usually means running multiple pieces of work so they appear to progress together.

In systems programming — and especially in C++ — this typically means using multiple threads to run work on one or more CPU cores. Sometimes people also talk about concurrency when a single core rapidly switches between tasks that share memory (e.g. Python’s “multithreading” vs “multiprocessing”).

In this post, when we say concurrency in C++, we’re talking about multithreaded programs that share memory, and how to keep them free of data races.

Why this matters

Compiler optimizations can be performed which completely change your original code, as long as the end result looks the same
Hardware accessess can be reordered to utilize cache lines more efficiently
The C++ memory model synchronizes writes to a memory location with other reads/writes to that location

Data races

A data race occurs when there are two accesses to a memory location at the same time, and at least one of them is a write.

Data races result in indeterminate behaviour in a program and must be avoided.

Getting started with C++ threads

Creating a multi-threaded program in C++ is super easy, just use the <thread> library.

#include <thread>

std::thread threadB = std::thread([](){
    std::cout << "Hello from threadB!\n";
});

std::cout << "Hello from threadA!\n";

threadB.join();

Let’s walk through this.

std::thread threadB = std::thread([](){
    std::cout << "Hello from threadB!\n";
});

This creates a new thread object and immediately starts running it. When the thread runs, it will execute the callable passed in as a parameter, in this case, a lambda function that prints a message.

std::cout << "Hello from threadA!\n";

This message will be printed from the main thread. The order that these two messages are printed is indeterminate.

threadB.join();

Now we wait until threadB has completed execution, at which point it is no longer joinable and our main thread can continue.

Getting information from a thread…

safely

Because joining with a thread is a synchronizing option, we can be sure that reads after join() will come after any writes within the thread.

int result{0};

std::thread threadB = std::thread([&](){
    result = 42;
});

threadB.join();

std::cout << "The result of threadB was " << result << std::endl; // Will print 42

dangerously

If we try to access result without the join, we have a data race.

int result{0};

std::thread threadB = std::thread([&](){
    result = 42;
});

std::cout << "The result of threadB was " << result << std::endl; // Indeterminate!

threadB.join(); // Join after print. Bad.

In this scenario, result could be 0 or it could be 42. This is undefined behaviour and must be avoided.

perilously

If we have multiple writes, we’re really in trouble.

int result{0};

std::thread threadB = std::thread([&](){
    for (int i = 0; i < 10000; i++) {
        result++;
    }
});

for (int i = 0; i < 10000; i++) {
    result++;
}

threadB.join(); // Join before print. Good.

std::cout << "The result is " << result << std::endl;

Notice how even though the join() is before the print statement, we still can’t predict the result?

This is because we have a physical data race: memory is being written to and read by both threads at the same time, without synchronization.

We can have many different results depending on how the various reads/writes clash during this program execution.

Fixing races with std::atomic

We can fix the physical data race by declaring result as an atomic.

This synchronizes every access to the atomic; however, it doesn’t by itself guarantee the order of execution between threads (still a semantic data race if we read before both writers are done)!

Incorrect attempt

std::atomic<int> result{0};

std::thread threadB = std::thread([&](){
    for (int i = 0; i < 10000; i++) {
        result++;
    }
});
for (int i = 0; i < 10000; i++) {
    result++;
}

std::cout << "The result is " << result << std::endl;

threadB.join(); // Join after print. Bad.

We still can’t predict what will be printed. When I run this on Compiler Explorer with GCC 15.2, I get either 10000 or 20000.

Correct solution

We can eliminate the data race completely by moving the join() to the appropriate location.

std::atomic<int> result{0};

std::thread threadB = std::thread([&](){
    for (int i = 0; i < 10000; i++) {
        result++;
    }
});
for (int i = 0; i < 10000; i++) {
    result++;
}

threadB.join(); // Join before print. Good.

std::cout << "The result is " << result << std::endl; // Always 20000

Now we guarantee that every access to result is guarded and that when we print the result, both threads have synchronized.

Basic Synchronization

There are several included synchronization primitives in C++.

std::mutex – a simple example

std::mutex provides lock() and unlock() mechanisms for protecting data.

class TokenPool {
    std::mutex mtx{};
    std::vector<Token> tokens{};

    Token getToken() {
        mtx.lock();
        if (tokens.empty()) {
            tokens.push_back(Token::create());
        }
        Token t = std::move(tokens.back());
        tokens.pop_back();
        mtx.unlock();
        return t;
    }
};

In this simple demonstration, TokenPool provides thread-safe access to the vector of tokens by protecting the data contained within.

Adding another function

When we introduce a simple ‘getter’ we might introduce a data race.

class TokenPool {
    std::mutex mtx{};
    std::vector<Token> tokens{};

    Token getToken() {
        mtx.lock();
        if (tokens.empty()) {
            tokens.push_back(Token::create());  // If this throws, our mutex is never unlocked
        }
        Token t = std::move(tokens.back());
        tokens.pop_back();
        mtx.unlock();
        return t;
    }

    size_t numTokensAvailable() const {
        return tokens.size();   // A new data race!
    }
};

Now we have the possibility for both read and write at the same time.

Notice another problem with potential exception throwing as well.

Fixing it all

The C++ standard library can help us with both these problems.

std::lock_guard<T> is an RAII wrapper for a mutex.

struct TokenPool {
    std::mutex mtx{};
    std::vector<Token> tokens{};

    Token getToken() {
        std::lock_guard<std::mutex> lk(mtx);    // The change
        if (tokens.empty()) {
            tokens.push_back(Token::create());
        }
        Token t = std::move(tokens.back());
        tokens.pop_back();
        return t;
    }

    size_t numTokensAvailable() const {
        std::lock_guard lk(mtx);    // The change
        return tokens.size();
    }
};

The std::lock_guard destructor unlocks the mutex, solving both our problems.

Passing the lock

If you want to pass the lock from one function to another like a unique_ptr, try std::unique_lock.

std::unique_lock<std::mutex> foo(std::unique_lock<std::mutex> lk) {
    if (rand()) {
        lk.unlock();
    }
    return lk;
}

lock_guard can’t be passed and might be more efficient if you don’t need this feature.

The new lock_guard

C++17 added std::scoped_lock to replace std::lock_guard for most use cases.

It allows us to pass multiple mutexes at the same time.

size_t numTokensAvailable() const {
    std::scoped_lock lk(mtx);               // scoped_lock<mutex>
    return tokens.size();
}

void mergeTokensFrom(TokenPool& rhs) {
    std::scoped_lock lk(mtx, rhs.mtx);      // scoped_lock<mutex, mutex>
    tokens.insert(rhs.tokens.begin(), rhs.tokens.end());
    rhs.tokens.clear();
}

Advanced Synchronization

Sometimes we require advanced locking mechanisms.

condition_variable

std::condition_variable allows us to implement a thread-safe “wait until” behaviour.

This is the go-to C++ mechanism for “producer/consumer” relationships where the consumer must wait for the producer.

struct TokenPool {
    std::mutex mtx{};
    std::vector<Token> tokens{};
    std::condition_variable cv{};

    Token getToken() {
        std::unique_lock lk(mtx);
        while (tokens.empty()) {
            cv.wait(lk);
        }
        Token t = std::move(tokens.back());
        tokens.pop_back();
        return t;
    }

    void returnToken(Token t) {
        std::unique_lock lk(mtx);
        tokens.push_back(std::move(t));
        lk.unlock();
        cv.notify_one();
    }

    // ...
};

Let’s break this behaviour down.

    std::mutex mtx{};
    std::vector<Token> tokens{};
    std::condition_variable cv{};

Here we’ve created a new condition_variable object. Next we’ll see how it’s used.

In the getToken() function we have the following:

    Token getToken() {
        std::unique_lock lk(mtx);
        while (tokens.empty()) {
            cv.wait(lk);
        }
        Token t = std::move(tokens.back());
        tokens.pop_back();
        return t;
    }

cv.wait(lk) tells the thread to relinquish the lock and go to sleep. Importantly, when it wakes up, it will safely reacquire the lock before proceeding.

Meanwhile, when we return a token:

    void returnToken(Token t) {
        std::unique_lock lk(mtx);
        token.push_back(t);
        lk.unlock();
        cv.notify_one();
    }

The notify_one() function of condition_variable will wake up any “waiting” threads.

Together, this allows threads that are waiting on a token to sleep without the lock until a token is available, then wake and acquire the lock before proceeding.

once_flag

If we need a thread-safe way to initialize a non-static object, we can use std::once_flag (C++11).

struct Logger {
    std::once_flag once;
    std::optional<NetworkConnection> conn;

    NetworkConnection& getConn() {
        std::call_once(once, [](){
            conn = NetworkConnection(defaultHost);
        });
        return *conn;
    }
};

Many threads can queue up on call_once, the first one to succeed will set the flag.

shared_mutex

C++17 provides a reader/writer lock with std::shared_mutex.

class ThreadsafeConfig {
    std::map<std::string, int> settings;
    mutable std::shared_mutex rw;

    void set(const std::string& name, int value) {
        std::unique_lock<std::shared_mutex> lk(rw);     // unique_lock to write
        settings.insert_or_assign(name, value);
    }

    int get(const std::string& name) const {
        std::shared_lock<std::shared_mutex> lk(rw);     // shared_lock to read
        return settings.at(name);
    }
};

With a shared_mutex, many threads can concurrently call lock_shared, but when one thread wants to call lock, it will block out all others.

counting_semaphore

C++20 adds semaphores to enable a specific number of threads to acquire a lock at a time.

class AnonymousTokenPool {
    std::counting_semaphore<256> sem{100};  // max 256, initial 100

    void getToken() {
        sem.acquire();                      // may block
    }

    void returnToken() {
        sem.release();
    }
}

With semaphores, any thread can call release on a token acquired by another thread… think of it like a pool where as long as there’s any token available, I can grab one, I don’t need a specific one.

latch

A std::latch is a one-time starting gate in C++20 that essentially says “wait until everyone gets here and then unblock everyone”.

It is initialized with an integer counter and then…

latch.wait() blocks until the counter reaches zero
latch.count_down() decrements the counter.
latch.arrive_and_wait() decrements and begins waiting.

std::latch myLatch(2);

std::thread threadB = std::thread([&](){
    myLatch.arrive_and_wait();
    std::cout << "Hello from B\n";
});

std::cout << "Hello from A\n";

myLatch.arrive_and_wait();
threadB.join();

std::cout << "Hello from A again\n";

barrier

std::barrier is a resettable latch in C++20.

After barrier.arrive_and_wait() releases everyone, it is reset and will begin blocking again until everyone is caught up.

std::barrier b(2, []{ std::cout << "Green flag, go!\n"; });

std::thread threadB = std::thread([&](){
    std::cout << "B is setting up\n";
    b.arrive_and_wait();
    std::cout << "B is running\n";
});

std::cout << "A is setting up\n";
b.arrive_and_wait();
std::cout << "A is running\n";
threadB.join();

This barrier is initialized with a completion function that will print “Green flag, go!” before either A or B print “running”.

Quick reference

Data race: two or more threads access the same memory location at the same time, at least one access is a write, and there is no synchronization that orders them. Result: undefined behaviour.
Thread creation: use std::thread t(f); and always call t.join() (or t.detach() if you truly want it to run independently).
Safe communication via join: writes done in a thread happen before code that runs after join() on that thread.
Atomics: use std::atomic<T> for simple shared counters/flags. They remove data races but not higher-level logic bugs.
Mutex-based protection: protect shared invariants (not just single operations) with std::mutex + std::lock_guard / std::scoped_lock.
condition_variable: for “wait until X is true” patterns (classic producer–consumer).
once_flag / call_once: for one-time initialization of shared state.
shared_mutex: many readers, single writer.
Semaphores, latch, barrier (C++20): advanced coordination primitives when a simple mutex isn’t enough.

Best practices

Avoid sharing data by default. Prefer message passing or thread-safe queues over lots of shared mutable state.
Always define an ownership model. Know which thread owns which object, and when ownership can move.
Guard invariants, not just variables. If multiple fields must be updated together, protect all of them with the same mutex.
Keep critical sections small. Do as little work as possible while holding a lock; avoid blocking I/O or long computations inside locks.
Prefer RAII for locking. Use std::lock_guard, std::unique_lock, or std::scoped_lock instead of manual lock()/unlock().
Always join or detach threads. Leaking a joinable thread (by destroying std::thread without joining/detaching) terminates the program.
Avoid data races on the design level. Don’t rely on timing or “it usually works”; reason in terms of happens-before relationships.
Use higher-level abstractions when possible. Thread pools, executors, and task libraries reduce the surface area for mistakes.
Be explicit about C++ standard version. Some primitives (shared_mutex, counting_semaphore, latch, barrier) require C++17/20 and a suitable standard library implementation.

Take-aways and notes

A program that has a data race has undefined behaviour; it is not “just a bit flaky”.
std::thread plus join() already gives you a well-defined way to pass results back to the caller.
std::atomic is great for simple counters and flags, but it does not automatically make whole data structures thread-safe.
Mutexes protect regions of code that maintain invariants, not just individual loads/stores.
Higher-level primitives like condition_variable, shared_mutex, latch, and barrier model common patterns (producer–consumer, readers–writers, start gates, and phases) so you don’t reinvent them badly.
The C++ memory model is subtle; lean on well-tested primitives from the standard library instead of rolling your own low-level atomics.