Re: 16-bit page checksums for 9.2

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-02-21 10:07:40
Message-ID: 20120221100740.GD2279@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 20, 2012 at 08:57:08AM -0500, Robert Haas wrote:
> On Mon, Feb 20, 2012 at 4:18 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > What straightforward implementation is that?? This *is* the straightforward one.
> >
> > God knows what else we'd break if we drop the lock, reacquire as an
> > exclusive, then drop it again and reacquire in shared mode. Code tends
> > to assume that when you take a lock you hold it until you release;
> > doing otherwise would allow all manner of race conditions to emerge.
> >
> > And do you really think that would be faster?
>
> I don't know, but neither do you, because you apparently haven't tried
> it. Games where we drop the shared lock and get an exclusive lock
> are used in numerous places in the source code; see, for example,
> heap_update(), so I don't believe that there is any reason to reject
> that a priori. Indeed, I can think of at least one pretty good reason
> to do it exactly that way: it's the way we've handled all previous
> problems of this type, and in general it's better to make new code
> look like existing code than to invent something new, especially when
> you haven't made any effort to quantify the problem or the extent to
> which the proposed solution mitigates it.

We do, in numerous places, drop a shared buffer content lock and reacquire it
in exclusive mode. However, the existing users of that pattern tend to trade
the lock, complete subsequent work, and unlock the buffer all within the same
function. So they must, because several of them recheck facts that can change
during the period of holding no lock. SetBufferCommitInfoNeedsSave() callers
do not adhere to that pattern; they can be quite distant from the original
lock acquisition and eventual lock release. Take the prototypical case of
SetHintBits(). Our shared lock originated in a place like heapgettup(), which
expects to hold it continually until finished. We would need to somehow push
up through HeapTupleSatisfiesVisibility() the fact that we traded the buffer
lock, then have heapgettup() reorient itself as needed. Every caller of code
that can reach SetBufferCommitInfoNeedsSave() would need to do the same.
Perhaps there's a different way to apply the lock-trade technique that avoids
this mess, but I'm not seeing it. Consequently, I find the idea of requiring
a spinlock acquisition to read or write pd_lsn/pd_tli under BUFFER_LOCK_SHARE
to be a sensible one. Within that umbrella, some details need attention:

- Not all BUFFER_LOCK_SHARE callers of PageGetLSN() live in bufmgr.c. I note
gistScanPage() and XLogCheckBuffer()[1]. (Perhaps we'll only require the
spinlock for heap buffers, letting gistScanPage() off the hook.) We need a
public API, perhaps LockBufferForLSN().

- The use of some spinlock need not imply using the buffer header spinlock.
We could add a dedicated pd_lsn_tli_lock to BufferDesc. That has the usual
trade-off of splitting a lock: less contention at the cost of more
acquisitions. I have no intuition on which approach would perform better.

I agree that src/backend/storage/buffer/README is the place to document the
new locking rules.

I do share your general unease about adding new twists to the buffer access
rules. Some of our hairiest code is also the code manipulating buffer locks
most extensively, and I would not wish to see that code get even more
difficult to understand. However, I'm not seeing a credible alternative that
retains the same high-level behaviors.

A possible compromise is to leave the page clean after setting a hint bit,
much like the patch already has us do under hot standby. Then there's no new
WAL and no new rules around pd_lsn. Wasn't that one of the things Merlin
benchmarked when he was looking at hint bits? Does anyone recall the result?

Thanks,
nm

[1] Patch v10 adds a comment to XLogCheckBuffer() saying otherwise. The
comment is true today, but the same patch makes it false by having
XLogSaveBufferForHint() call XLogInsert() under a share lock.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2012-02-21 10:13:55 Re: Speed dblink using alternate libpq tuple storage
Previous Message Peter Geoghegan 2012-02-21 10:03:27 Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)