Why the volatile keyword probably isn’t necessary in multi-threaded programming

Why the volatile keyword probably isn’t necessary in multi-threaded programming

Interesting article from the Intel guys doing TBB.

Arch Robinson just removed almost ALL the volatile keywords from Intel Thread Building Blocks.  Why?  For several reasons, but mostly because he claims that overall it slows your code, probably does not actually solve the underlying ordering problems if your code needs to be portable (a REAL concern on today’s writing of games/apps for x86, Xbox, PS3, and iPhone devices!),  and likely isn’t doing what you think it’s doing anyway.  Here’s a pertinent example:

Sometimes programmers think of volatile as turning off optimization of volatile accesses. That’s largely true in practice. But that’s only the volatile accesses, not the non-volatile ones. Consider this fragment:

    volatile int Ready; 

    int Message[100];

    void foo( int i ) {

        Message[i/10] = 42;

        Ready = 1;


It’s trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with “gcc -O2 -S” using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It’s an aggressive optimizer doing its job.

You might think the solution is to mark all your memory references volatile. That’s just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. x86 hardware will not reorder it. Neither will an Itanium(TM) processor, because Itanium compilers insert memory fences for volatile stores. That’s a clever Itanium extension. But chips like Power(TM) will reorder. What you really need for ordering are memory fences, also called memory barriers.

So what’s the solution for multi-threaded programming? Use a library or language extension hat implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:

* POSIX threads
* Windows(TM) threads
* OpenMP

So, when is volatile actually necessary?  It turns out there are only 3 portable cases volatile is actually needed:

  • marking a local variable in the scope of a setjmp so that the variable does not rollback after a longjmp.
  • memory that is modified by an external agent or appears to be because of a screwy memory mapping
  • signal handler mischief

And now you know, and knowing is half the battle.

Leave a Reply

Your email address will not be published. Required fields are marked *