Re: Disk full and WALs

From: John Krasnay <john(at)krasnay(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Disk full and WALs
Date: 2010-08-01 20:11:47
Message-ID: 4C55D503.1030108@krasnay.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10-08-01 03:03 PM, Tom Lane wrote:
> The archiver will retry, *if the archive command returns non-zero exit
> status*. It sounds to me like you're using an archive command script
> that dutifully logs a failure but is careless about returning the proper
> exit status.

That was my first thought, too, but the PostgreSQL log says this...

2010-07-31 06:29:11 EDT LOG: archive command failed with exit code 1

...so it definitely knew about it. It was also suspicious that
00000001000002BD00000072.00000020.backup hung around in the pg_xlog
directory; if the server thought the archive command was successful it
would presumably have cleaned it up.

> I'm afraid you're probably screwed as far as replaying any data beyond
> the lost WAL segment goes. Even if you forced the system to try to
> replay it, you'd have corrupted database state because of the omission
> of the changes that were in the lost segment. If you still have the
> original $PGDATA tree (ie you didn't blow it away while trying the PITR
> idea) then you might be able to get a closer approximation to current
> time by doing resetxlog and starting up --- though the consistency of
> the DB would still be questionable, so a dump and reload would be
> advisable.
>
> regards, tom lane

Luckily, we were able to rebuild our data from out-of-band data, but
it's good to know about resetxlog.

Thanks for your help.

jk

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Karl Denninger 2010-08-01 20:35:17 Compression on SSL links?
Previous Message Tom Lane 2010-08-01 19:03:22 Re: Disk full and WALs