Re: Disaster!

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Martín Marqués <martin(at)bugs(dot)unl(dot)edu(dot)ar>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Disaster!
Date: 2004-01-24 06:31:36
Message-ID: Pine.LNX.4.58.0401241726570.23715@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 23 Jan 2004, Tom Lane wrote:

> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > Tom's answer will be undoubtly better ...
>
> Nope, I think you got all the relevant points.
>
> The only thing I'd add after having had more time to think about it is
> that this seems very much like the problem we noticed recently with
> recovery-from-WAL being broken by the new code in bufmgr.c that tries to
> validate the header fields of any page it reads in. We had to add an
> escape hatch to disable that check while InRecovery, and I expect what
> we will end up with here is a few lines added to slru.c to make it treat
> read-past-EOF as a non-error condition when InRecovery. Now the clog
> code has always had all that paranoid error checking, but because it
> deals in such tiny volumes of data (only 2 bits per transaction), it's
> unlikely to suffer an out-of-disk-space condition. That's why we hadn't
> seen this failure mode before.

It seems that by adding the following to SlruPhysicalReadPage() we can
recover in a reasonable way here. Instead of:

if (lseek(fd, (off_t) offset, SEEK_SET) < 0)
{
slru_errcause = SLRU_SEEK_FAILED;
slru_errno = errno;
return false;
}

We have:

if (lseek(fd, (off_t) offset, SEEK_SET) < 0)
{
if(!InRecovery)
{
slru_errcause = SLRU_SEEK_FAILED;
slru_errno = errno;
return false;
}
ereport(LOG,
(errmsg("Short read from file \"%s\", reading as zeroes",
path)));
MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
return true;
}

Which is exactly how we recover from a missing pg_clog file.

>
> regards, tom lane

Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dennis Bjorklund 2004-01-24 07:42:35 cvsignore
Previous Message Satoshi Nagayasu 2004-01-24 04:59:34 Re: [pgsql-advocacy] PostgreSQL installation CD based on Morphix