I suspected $SUBJECT from the beginning, and I've now put in enough work
to be able to prove it. The attached test program reliably fails within
a few minutes of being started, when run with 8 worker processes on an
8-core PPC machine. It's a pretty simple "token passing ring" protocol,
and at some point one of the processes sees its latch set without seeing
its flag set, so it goes back to sleep and the token stops getting passed.
The original worry about the latch code was that there was some internal
race condition of this sort, but I think we were failing to see the forest
for the trees. The specification of how to use a latch reads like this:
* The correct pattern to wait for an event is:
* for (;;)
* if (work to do)
* Do Stuff();
* It's important to reset the latch *before* checking if there's work to
* do. Otherwise, if someone sets the latch between the check and the
* ResetLatch call, you will miss it and Wait will block.
* To wake up the waiter, you must first set a global flag or something
* else that the main loop tests in the "if (work to do)" part, and call
* SetLatch *after* that.
Any protocol of that sort has *obviously* got a race condition between
changes of the latch state and changes of the separate flag state, if run
in a weak-memory-ordering machine.
At least on the hardware I'm testing, it seems that the key requirement
for a failure to be seen is that there's a lot of contention for the cache
line containing the flag, but not for the cache line containing the latch.
This allows the write to the flag to take longer to propagate to main
memory than the write to the latch. I could not get it to fail until
I added the extra shared-memory traffic in the delay loop around line 175.
This probably also explains why it doesn't fail with small numbers of
workers --- you need enough contending CPUs to saturate the memory
hardware. (On Red Hat's test machines, 7 or 8 workers would reliably
fail within a few minutes, but 6 or fewer didn't always.)
I'm not sure whether we need to do anything about this for 9.1; it may be
that none of the existing uses of latches are subject to the kind of
contention that would create a risk. But I'm entirely convinced that we
need to insert some memory-barrier instructions into the latch code before
we expand our uses of latches much more. Since we were already finding
that memory barriers will be needed for performance reasons elsewhere,
I think it's time to bite the bullet and develop that infrastructure.
regards, tom lane
pgsql-hackers by date
|Next:||From: Alexander Korotkov||Date: 2011-08-07 19:28:10|
|Subject: Re: WIP: Fast GiST index build|
|Previous:||From: Tom Lane||Date: 2011-08-07 16:02:34|
|Subject: Re: [RFC] Common object property boards |