Re: sinval synchronization considered harmful

From: Noah Misch <noah(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sinval synchronization considered harmful
Date: 2011-07-27 16:55:33
Message-ID: 20110727165531.GD18910@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 26, 2011 at 09:57:10PM -0400, Robert Haas wrote:
> On Tue, Jul 26, 2011 at 4:38 PM, Noah Misch <noah(at)2ndquadrant(dot)com> wrote:
> > No new ideas come to mind, here.
>
> OK, I have a new idea. :-)
>
> 1. Add a new flag to each procState called something like "timeToPayAttention".
> 2. Each call to SIGetDataEntries() iterates over all the ProcStates
> whose index is < lastBackend and sets stateP->timeToPayAttention =
> TRUE for each.
> 3. At the beginning of SIGetDataEntries(), we do an unlocked if
> (!stateP->timeToPayAttention) return 0.
> 4. Immediately following that if statement and before acquiring any
> locks, we set stateP->timeToPayAttention = FALSE.
>
> The LWLockRelease() in SIGetDataEntries() will be sufficient to force
> all of the stateP->timeToPayAttention writes out to main memory, and
> the read side is OK because we either just took a lock (which acted as
> a fence) or else there's a race anyway. But unlike my previous
> proposal, it doesn't involve *comparing* anything. We needn't worry
> about whether we could read two different values that are through
> great misfortune out of sync because we're only reading one value.
>
> If, by chance, the value is set to true just after we set it to false,
> that won't mess us up either: we'll still read all the messages after
> acquiring the lock.

This approach would work if a spinlock release constrained the global stores
timeline. It makes a weaker guarantee: all stores preceding the lock release
in program order will precede it globally. Consequently, no processor will
observe the spinlock to be available without also observing all stores made by
the last holding processor before that processor released it. That guarantee
is not enough for this sequence of events:

Backend 0:
add a message for table "a"
store 5 => maxMsgNum
store true => timeToPayAttention
LWLockRelease(SInvalWriteLock)
<plenty of time passes>
Backend 2:
LOCK TABLE a;
load timeToPayAttention => true
Backend 1:
add a message for table "b"
store 6 => maxMsgNum
SpinLockRelease(&vsegP->msgnumLock);
store true => timeToPayAttention
Backend 2:
store false => timeToPayAttention
LWLockAcquire(SInvalReadLock, LW_SHARED)
load maxMsgNum => 5 [!]
Backend 1:
LWLockRelease(SInvalWriteLock);
Backend 2:
LOCK TABLE b;
load timeToPayAttention => false
<use b without processing updates>

The "SpinLockRelease(&vsegP->msgnumLock)" guarantees that any process
observing the backend 2 store of timeToPayAttention will also observe
maxMsgNum = 6. However, nothing constrains which timeToPayAttention store
will "win" in main memory. Backend 1 can observe neither update from backend
2, yet still have its store appear later than the backend 1 stores in the
global timeline.

> Now, there's some downside to this approach - it involves doing O(N)
> work for each invalidation message we receive. But as long as it's
> only O(N) stores and not O(N) lock acquisitions and releases, I think
> that might be OK.

I think a benchmark is in order, something like 900 idle connections and 80
connections running small transactions that create a few temporary tables. If
that shows no statistically significant regression, then we're probably fine
here. I'm not sure what result to expect, honestly.

What did you think of making the message number a uint64, removing wraparound
handling, and retaining SISeg.msgnumLock for 32-bit only? You could isolate the
variant logic in READ_MSGNUM and WRITE_MSGNUM macros.

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-07-27 17:14:38 Re: isolation test deadlocking on buildfarm member coypu
Previous Message Robert Haas 2011-07-27 16:52:50 Re: sinval synchronization considered harmful