Re: Inadequate thought about buffer locking during hot standby replay

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Inadequate thought about buffer locking during hot standby replay
Date: 2012-11-10 21:24:22
Message-ID: 1448.1352582662@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On 10 November 2012 18:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Here's a WIP patch that attacks it in this way.

> Looks fine, but we need a top-level comment in btree code explaining
> why/how it follows the very well explained rules in the README.

Not sure what you'd want it to say exactly? Really these issues are per
WAL-replay function, not global. I've now been through all of the btree
functions, and AFAICS btree_xlog_split is the only one of them that
actually has a bug; the others can be a bit lax about the order in which
things get done.

>> One thing that seems a bit annoying is the use of zero-based backup
>> block indexes in RestoreBackupBlock, when most (not all) of the callers
>> are referencing macros with one-based names (XLR_BKP_BLOCK_1 etc).
>> That's a bug waiting to happen. We could address it by changing
>> RestoreBackupBlock to accept a one-based argument, but what I'm thinking
>> of doing instead is getting rid of the one-based convention entirely;
>> that is, remove XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4 and instead have
>> all code use the XLR_SET_BKP_BLOCK() macro, which is zero-based. One
>> advantage of doing that is that it'll force inspection of all code
>> related to this.

> I wouldn't do that in a back branch, but I can see why its a good idea.

If any of this stuff were getting used by external modules, changing it
would be problematic ... but fortunately, it isn't, because we lack
support for plug-in rmgrs. So I'm not worried about back-patching the
change, and would prefer to keep the 9.x branches in sync.

Attached is an updated patch in which nbtxlog.c has been fully converted,
but I've not touched other rmgrs yet. I've tested this to the extent of
having a replication slave follow a master that's running the regression
tests, and it seems to work. It's difficult to prove much by testing
about concurrent behavior, of course.

I found that there's another benefit of managing backup-block replay
this way, which is that we can de-contort the handling of conflict
resolution by moving it into the per-record-type subroutines, instead of
needing an extra switch before the RestoreBkpBlocks call. So that gives
me more confidence that this is a good way to go.

regards, tom lane

Attachment Content-Type Size
restore-backup-blocks-individually-1.patch text/x-patch 30.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-11-10 21:37:43 Re: Inadequate thought about buffer locking during hot standby replay
Previous Message Simon Riggs 2012-11-10 20:59:13 Re: Inadequate thought about buffer locking during hot standby replay