Re: page corruption on 8.3+ that makes it to standby

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: page corruption on 8.3+ that makes it to standby
Date: 2010-07-28 19:16:27
Message-ID: 21261.1280344587@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jul 28, 2010 at 2:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I've caught up on the thread now, and I think that fix2 (skip logging
>> the page) is extremely dangerous and has little if anything in its
>> favor.

> Why do you think that? They will be different only in terms of
> whether the uninitialized bytes are before or after the nominal EOF,
> and we know we have to be indifferent to that case anyway.

(1) You're assuming that the page will be zeroes on the slave without
having forced it to be so. A really obvious case where this fails
is where we're doing crash-and-restart on the master: a later action
could have modified the page away from the all-zero state. (In
principle that's OK but I think this might break torn-page protection.)

(2) On filesystems that support holes, the page will not have storage,
whereas it (probably) does on the master. This could lead to a
divergence in behavior later, ie slave runs out of disk space at a
different point than the master.

(3) The position of the nominal EOF can drive choices about which page
to put new tuples in, specifically thats where RelationGetBufferForTuple
will go if FSM has no information. This could result in unexpected
divergence in behavior after the slave goes live compared to what the
master would have done. Maybe that's OK but it seems better to avoid
it if we can, especially when you think about crash-and-restart on the
master as opposed to a separate slave.

Now as I said earlier, these are all tiny corners of a corner case, and
they *probably* shouldn't matter. But I see no good reason to expose
ourselves to the possibility that there's some cases where they do
matter. Especially when your argument for fix2 is a purely aesthetic
judgment that I don't agree with anyway.

>> I think it is appropriate to be setting the LSN/TLI in the case of a
>> page that's been constructed by the caller as part of the WAL-logged
>> action, but doing so in copy_relation_data seems rather questionable.
>> We certainly didn't change the source page so changing its LSN seems
>> rather wrong --- wouldn't it be better to just copy the source pages
>> with their original LSNs?

> It seems like if log_newpage() were to set the LSN/TLI before calling
> XLogInsert() - or optionally not - then it wouldn't be necessary to
> set them also in heap_xlog_newpage(); the memcpy operation would by
> definition have copied the right information onto the page.

Not possible because it is only after you've done XLogInsert that you
know what LSN was assigned to the WAL record.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-07-28 19:37:49 Re: page corruption on 8.3+ that makes it to standby
Previous Message Robert Haas 2010-07-28 19:09:36 Re: page corruption on 8.3+ that makes it to standby