page corruption on 8.3+ that makes it to standby

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: page corruption on 8.3+ that makes it to standby
Date: 2010-07-27 18:06:15
Message-ID: 1280253975.23350.99.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I reported a problem here:

http://archives.postgresql.org/pgsql-bugs/2010-07/msg00173.php

Perhaps I used a poor subject line, but I believe it's a serious issue.
That reproducible sequence seems like an obvious bug to me on 8.3+, and
what's worse, the corruption propagates to the standby as I found out
today (through a test, fortunately).

The only mitigating factor is that it doesn't actually lose data, and
you can fix it (I believe) with zero_damaged_pages (or careful use of
dd).

There are two fixes that I can see:

1. Have log_newpage() and heap_xlog_newpage() only call PageSetLSN() and
PageSetTLI() if the page is not new. This seems slightly awkward because
most WAL replay stuff doesn't have to worry about zero pages, but in
this case I think it does.

2. Have copy_relation_data() initialize new pages. I don't like this
because (a) it's not really the job of SET TABLESPACE to clean up zero
pages; and (b) it could be an index with different special size, etc.,
and it doesn't seem like a good place to figure that out.

Comments?

Regards,
Jeff Davis

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-07-27 18:11:02 Re: do we need to postpone beta4?
Previous Message Robert Haas 2010-07-27 18:04:04 do we need to postpone beta4?