Re: [GENERAL] PANIC: heap_update_redo: no block

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Alex bahdushka <bahdushka(at)gmail(dot)com>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] PANIC: heap_update_redo: no block
Date: 2006-03-28 10:01:27
Message-ID: 1143540087.3839.304.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, 2006-03-27 at 22:03 -0500, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> I think what's happened here is that VACUUM FULL moved the only tuple
> >> off page 1 of the relation, then truncated off page 1, and now
> >> heap_update_redo is panicking because it can't find page 1 to replay the
> >> move. Curious that we've not seen a case like this before, because it
> >> seems like a generic hazard for WAL replay.
>
> > This sounds familiar
> > http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php

Yes, I remember that also.

> After further review I've concluded that there is not a systemic bug
> here, but there are several nearby local bugs.

IMHO that's amazing to find so many bugs in a code review of existing
production code. Cool.

> The reason it's not
> a systemic bug is that this scenario is supposed to be handled by the
> same mechanism that prevents torn-page writes: the first XLOG record
> that touches a given page after a checkpoint is supposed to rewrite
> the entire page, rather than update it incrementally. Since XLOG replay
> always begins at a checkpoint, this means we should always be able to
> write a fresh copy of the page, even after relation deletion or
> truncation. Furthermore, during XLOG replay we are willing to create
> a table (or even a whole tablespace or database directory) if it's not
> there when touched. The subsequent replay of the deletion or truncation
> will get rid of any unwanted data again.

That will all work, agreed.

> The subsequent replay of the deletion or truncation
> will get rid of any unwanted data again.

Trouble is, it is not a watertight assumption that there *will be* a
subsequent truncation, even if it is a strong one. If there is not a
later truncation, we will just ignore what we ought to now know is an
error and then try to continue as if the database was fine, which it
would not be.

The overall problem is that auto extension fails to take action or
provide notification with regard to file system corruptions. Clearly we
would like xlog replay to work even in the face of strong file
corruptions, but we should make attempts to identify this situation and
notify people that this has occurred.

I'd suggest both WARNING messages in the log and something more extreme
still: anyone touching a corrupt table should receive a NOTICE saying
"database recovery displayed errors for this table" "HINT: check the
database logfiles for specific messages". Indexes should have a log
WARNING saying "database recovery displayed errors for this index"
"HINT: use REINDEX to rebuild this index".

So I guess I had better help if we agree this is beneficial.

> Therefore, there is no systemic bug --- unless you are running with
> full_page_writes=off. I assert that that GUC variable is broken and
> must be removed.

On this analysis, I would agree for current production systems. But what
this says is something deeper: we must log full pages, not because we
fear a partial page write has occurred, but because the xlog mechanism
intrinsically depends upon the existence of those full pages after each
checkpoint.

The writing of full pages in this way is a serious performance issue
that it would be good to improve upon. Perhaps this is the spur to
discuss a new xlog format that would support higher performance logging
as well as log-mining for replication?

> There are, however, a bunch of local bugs, including these:

...

> Notice that these are each, individually, pretty low-probability
> scenarios, which is why we've not seen many bug reports.

Most people don't file bug reports. If we have a recovery mode that
ignores file system corruptions we'll get even less because any errors
that occur will be deemed as gamma rays or some other excuse.

> a systemic bug

Perhaps we do have one systemic problem: systems documentation.

The xlog code is distinct from other parts of the codebase in that it
has almost zero comments with it and the overall mechanisms are
relatively poorly documented in README form. Methinks there are very few
people who could attempt such a code review and even fewer who would
find any bugs by inspection. I'll think some more on that...

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jo De Haes 2006-03-28 11:47:33 Re: invalid page header
Previous Message surabhi.ahuja 2006-03-28 09:47:41 32 bit applications against 64 bit postgres server

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-03-28 10:23:59 Re: Shared memory
Previous Message Adrian Maier 2006-03-28 09:38:26 Re: Please help, pgAdmin3 on Debian!