From: | Noah Misch <noah(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: sinval synchronization considered harmful |
Date: | 2011-07-26 20:16:50 |
Message-ID: | 20110726201647.GA17857@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)2ndQuadrant(dot)com> writes:
> > On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote:
> >> I think I have a simpler idea, though:
> >> before acquiring any locks, just have SIGetDataEntries() do this:
> >>
> >> + if (stateP->nextMsgNum == segP->maxMsgNum && !stateP->resetState)
> >> + return 0;
> >>
> >> Patch (with comment explaining why I think this is OK) attached. If
> >> the message numbers happen to be equal only because the counter has
> >> wrapped, then stateP->resetState will be true, so we'll still realize
> >> we need to do some work.
>
> > This is attractive, and I don't see any problems with it. (In theory, you could
> > hit a case where the load of resetState gives an ancient "false" just as the
> > counters wrap to match. Given that the wrap interval is 1000000x as long as the
> > reset interval, I'm not worried about problems on actual silicon.)
>
> > +1 for doing this and moving on.
>
> After some further reflection I believe this patch actually is pretty
> safe, although Noah's explanation of why seems a bit confused. The key
> points are that (1) each of the reads is atomic, and (2) we should not
> be able to see a value that is older than our last acquisition/release
> of a shared memory lock. These being granted, we will make a decision
> that is correct, or at least was correct as of some time after that last
> lock action. As stated in the patch comments, we are not required to
> promptly assimilate sinval actions on objects we don't hold any lock on,
> so this seems sufficient. In essence, we are relying on an assumed
> prior visit to the lock manager to provide memory access synchronization
> for the sinval no-work-to-do test.
>
> The case Noah speculates about above amounts to supposing that this
> reasoning fails to hold for the resetState value, and I don't see why
> that would be true. Even if the above scenario did manage to happen,
> it would not mean that we missed the reset, just that we hadn't noticed
> it *yet*. And by hypothesis, none of the as-yet-not-seen catalog
> updates are a problem for us.
Here's the way it can fail:
1. Backend enters SIGetDataEntries() with main memory bearing stateP->resetState
= false, stateP->nextMsgNum = 500, segP->maxMsgNum = 505. The CPU has those
latest stateP values in cache, but segP->maxMsgNum is *not* in cache.
2. Backend stalls for <long time>. Meanwhile, other backends issue
MSGNUMWRAPAROUND - 5 invalidation messages. Main memory bears
stateP->resetState = true, stateP->nextMsgNum = 500 - MSGNUMWRAPAROUND,
segP->maxMsgNum = 500.
3. Backend wakes up, uses its cached stateP values and reads segP->maxMsgNum =
500 from main memory. The patch's test finds no need to reset or process
invalidation messages.
That's the theoretical risk I wished to illustrate. Though this appears
possible on an abstract x86_64 system, I think it's unrealistic to suppose that
a dirty cache line could persist *throughout* the issuance of more than 10^9
invalidation messages on a concrete implementation.
A way to think about the problem is that our read of segP->maxMsgNum is too new.
If we had used a snapshot of values as of the most recent lock
acquisition/release, there would be no problem. It's the mix of a new-enough
stateP with an all-too-new segP that yields the anomaly.
--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2011-07-26 20:30:16 | pgsql: Add missing newlines at end of error messages |
Previous Message | Tom Lane | 2011-07-26 19:44:07 | Re: Another issue with invalid XML values |