Re: 9.2.3 crashes during archive recovery

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.2.3 crashes during archive recovery
Date: 2013-02-15 13:49:47
Message-ID: 511E3CFB.1010208@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.02.2013 13:05, Ants Aasma wrote:
> On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>> The problem is that we startup Hot Standby before we hit the min
>> recovery point because that isn't recorded. For me, the thing to do is
>> to make the min recovery point == end of WAL when state is
>> DB_IN_PRODUCTION. That way we don't need to do any new writes and we
>> don't need to risk people seeing inconsistent results if they do this.
>
> While this solution would help solve my issue, it assumes that the
> correct amount of WAL files are actually there. Currently the docs for
> setting up a standby refer to "24.3.4. Recovering Using a Continuous
> Archive Backup", and that step recommends emptying the contents of
> pg_xlog. If this is chosen as the solution the docs should be adjusted
> to recommend using pg_basebackup -x for setting up the standby.

When the backup is taken using pg_start_backup or pg_basebackup,
minRecoveryPoint is set correctly anyway, and it's OK to clear out
pg_xlog. It's only if you take the backup using an atomic filesystem
snapshot, or just kill -9 the server and take a backup while it's not
running, that we have a problem. In those scenarios, you should not
clear pg_xlog.

Attached is a patch for git master. The basic idea is to split
InArchiveRecovery into two variables, InArchiveRecovery and
ArchiveRecoveryRequested. ArchiveRecoveryRequested is set when
recovery.conf exists. But if we don't know how far we need to recover,
we first perform crash recovery with InArchiveRecovery=false. When we
reach the end of WAL in pg_xlog, InArchiveRecovery is set, and we
continue with normal archive recovery.

> As a
> related point, pointing standby setup to that section has confused at
> least one of my clients. That chapter is rather scarily complicated
> compared to what's usually necessary.

Yeah, it probably could use some editing, as the underlying code has
evolved a lot since it was written. The suggestion to clear out pg_xlog
seems like an unnecessary complication. It's safe to do so, if you
restore with an archive, but unnecessary.

The "File System Level Backup" chapter
(http://www.postgresql.org/docs/devel/static/backup-file.html) probably
should mention "pg_basebackup -x", too.

Docs patches are welcome..

- Heikki

Attachment Content-Type Size
crash-recover-before-archive-recovery.patch text/x-diff 16.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-02-15 13:56:45 Re: [pgsql-advocacy] Call for Google Summer of Code mentors, admins
Previous Message Cédric Villemain 2013-02-15 13:46:56 Re: Temporal features in PostgreSQL