Re: sinval synchronization considered harmful

From: Noah Misch <noah(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sinval synchronization considered harmful
Date: 2011-07-26 20:16:50
Message-ID: 20110726201647.GA17857@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)2ndQuadrant(dot)com> writes:
> > On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote:
> >> I think I have a simpler idea, though:
> >> before acquiring any locks, just have SIGetDataEntries() do this:
> >>
> >> + if (stateP->nextMsgNum == segP->maxMsgNum && !stateP->resetState)
> >> + return 0;
> >>
> >> Patch (with comment explaining why I think this is OK) attached. If
> >> the message numbers happen to be equal only because the counter has
> >> wrapped, then stateP->resetState will be true, so we'll still realize
> >> we need to do some work.
>
> > This is attractive, and I don't see any problems with it. (In theory, you could
> > hit a case where the load of resetState gives an ancient "false" just as the
> > counters wrap to match. Given that the wrap interval is 1000000x as long as the
> > reset interval, I'm not worried about problems on actual silicon.)
>
> > +1 for doing this and moving on.
>
> After some further reflection I believe this patch actually is pretty
> safe, although Noah's explanation of why seems a bit confused. The key
> points are that (1) each of the reads is atomic, and (2) we should not
> be able to see a value that is older than our last acquisition/release
> of a shared memory lock. These being granted, we will make a decision
> that is correct, or at least was correct as of some time after that last
> lock action. As stated in the patch comments, we are not required to
> promptly assimilate sinval actions on objects we don't hold any lock on,
> so this seems sufficient. In essence, we are relying on an assumed
> prior visit to the lock manager to provide memory access synchronization
> for the sinval no-work-to-do test.
>
> The case Noah speculates about above amounts to supposing that this
> reasoning fails to hold for the resetState value, and I don't see why
> that would be true. Even if the above scenario did manage to happen,
> it would not mean that we missed the reset, just that we hadn't noticed
> it *yet*. And by hypothesis, none of the as-yet-not-seen catalog
> updates are a problem for us.

Here's the way it can fail:

1. Backend enters SIGetDataEntries() with main memory bearing stateP->resetState
= false, stateP->nextMsgNum = 500, segP->maxMsgNum = 505. The CPU has those
latest stateP values in cache, but segP->maxMsgNum is *not* in cache.
2. Backend stalls for <long time>. Meanwhile, other backends issue
MSGNUMWRAPAROUND - 5 invalidation messages. Main memory bears
stateP->resetState = true, stateP->nextMsgNum = 500 - MSGNUMWRAPAROUND,
segP->maxMsgNum = 500.
3. Backend wakes up, uses its cached stateP values and reads segP->maxMsgNum =
500 from main memory. The patch's test finds no need to reset or process
invalidation messages.

That's the theoretical risk I wished to illustrate. Though this appears
possible on an abstract x86_64 system, I think it's unrealistic to suppose that
a dirty cache line could persist *throughout* the issuance of more than 10^9
invalidation messages on a concrete implementation.

A way to think about the problem is that our read of segP->maxMsgNum is too new.
If we had used a snapshot of values as of the most recent lock
acquisition/release, there would be no problem. It's the mix of a new-enough
stateP with an all-too-new segP that yields the anomaly.

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2011-07-26 20:30:16 pgsql: Add missing newlines at end of error messages
Previous Message Tom Lane 2011-07-26 19:44:07 Re: Another issue with invalid XML values