Re: page corruption on 8.3+ that makes it to standby

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: page corruption on 8.3+ that makes it to standby
Date: 2010-07-27 19:50:08
Message-ID: AANLkTikFxgRpmdmz=ez2-vta_3kExzObxgnxBjkna_35@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 27, 2010 at 2:06 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> I reported a problem here:
>
> http://archives.postgresql.org/pgsql-bugs/2010-07/msg00173.php
>
> Perhaps I used a poor subject line, but I believe it's a serious issue.
> That reproducible sequence seems like an obvious bug to me on 8.3+, and
> what's worse, the corruption propagates to the standby as I found out
> today (through a test, fortunately).

I think that the problem is not so much your choice of subject line as
your misfortune to discover this bug when Tom and Heikki were both on
vacation.

> The only mitigating factor is that it doesn't actually lose data, and
> you can fix it (I believe) with zero_damaged_pages (or careful use of
> dd).
>
> There are two fixes that I can see:
>
> 1. Have log_newpage() and heap_xlog_newpage() only call PageSetLSN() and
> PageSetTLI() if the page is not new. This seems slightly awkward because
> most WAL replay stuff doesn't have to worry about zero pages, but in
> this case I think it does.
>
> 2. Have copy_relation_data() initialize new pages. I don't like this
> because (a) it's not really the job of SET TABLESPACE to clean up zero
> pages; and (b) it could be an index with different special size, etc.,
> and it doesn't seem like a good place to figure that out.

It appears to me that all of the callers of log_newpage() other than
copy_relation_data() do so with pages that they've just constructed,
and which therefore can't be new. So maybe we could just modify
copy_relation_data to check PageIsNew(buf), or something like that,
and only call log_newpage() if that returns true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-07-27 19:53:40 Re: do we need to postpone beta4?
Previous Message Robert Haas 2010-07-27 19:36:48 Re: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock