Re: sinval synchronization considered harmful

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: sinval synchronization considered harmful
Date: 2011-07-21 23:03:28
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul21, 2011, at 03:46 , Robert Haas wrote:
> Profiling this combination of patches reveals that there is still some
> pretty ugly spinlock contention on sinval's msgNumLock. And it occurs
> to me that on x86, we really don't need this lock ... or
> SInvalReadLock ... or a per-backend mutex. The whole of
> SIGetDataEntries() can pretty easily be made lock-free. The only real
> changes that seem to be are needed are (1) to use a 64-bit counter, so
> you never need to decrement and (2) to recheck resetState after
> reading the entries from the queue, to see if we got reset while we
> were reading those entries. Since x86 guarantees that writes will
> become visible in the order they are executed, we only need to make
> sure that the compiler doesn't rearrange things. As long as we first
> read the maxMsgNum and then read the messages, we can't read garbage.
> As long as we read the messages before we check resetState, we will be
> sure to notice if we got reset before we read all the messages (which
> is the only way that we can have read garbage messages).

Sounds sensible. There're one additional hazard though - you'll also
need the reads to be atomic. x86 guarantees that for up to 32 (i386)
respectively 64 (x64) loads, but only for reads from properly aligned
addresses (4 bytes for 4-byte reads, 8 bytes for 8-byte reads).

I founds that out the hard way a few days ago, again while playing with
different LWLock implementations, when I botched my test setup and
the proc array entries ended up being miss-aligned. Boy, was it fun
to debug the random crashes caused by non-atomic pointer reads...

If we widen the counter to 64-bit, reading it atomically on x86 becomes
a bit of a challenge on i386, but is doable also. From what I remember,
there are two options. You can either use the 8-byte compare-and-exchange
operation, but it might be that only quite recent CPUs support that. The
other options seems to be to use floating-point instructions. I believe
the latter is what Intel's own Thread Building Blocks library does, but
I'd have to re-check to be sure. It might also be that, once you starting
using floating-point instructions, you find that you actually do need
fencing instructions even on x86. Dunno if the weaker ordering affects only
SIMD instructions or all floating point stuff...

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2011-07-21 23:05:36 Re: storing TZ along timestamps
Previous Message Dan Ports 2011-07-21 22:44:59 Re: sinval synchronization considered harmful