Re: Hot Backup with rsync fails at pg_clog if under load

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-10-27 06:57:41
Message-ID: 4EA900E5.9070905@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.10.2011 02:29, Florian Pflug wrote:
> Per my theory about the cause of the problem in my other mail, I think you
> might see StartupCLOG failures even during crash recovery, provided that
> wal_level was set to hot_standby when the primary crashed. Here's how
>
> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
> The assigned XID requires CLOG extension.
> 3) The checkpoint continues, and LogStandbySnapshot () advances the
> checkpoint's nextXid to the XID assigned in (2).
> 4) We crash after writing the checkpoint record, but before the CLOG
> extension makes it to the disk, and before any trace of the XID assigned
> in (2) makes it to the xlog.
>
> Then StartupCLOG() would fail at the end of recovery, because we'd end up
> with a nextXid whose corresponding CLOG page doesn't exist.

No, clog extension is WAL-logged while holding the XidGenLock. At step
3, LogStandbySnapshot() would block until the clog-extension record is
written to WAL, so crash recovery would see and replay that record
before calling StartupCLOG().

That can happen during hot standby, though, because StartupCLOG() is
called earlier.

> My suggestion is to fix the CLOG problem in that same way that you fixed
> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
> CheckPointGuts().
>
> Here's what I image CreateCheckPoint() should look like:
>
> 1) LogStandbySnapshot() and fill out oldestActiveXid
> 2) Fill out REDO
> 3) Wait for concurrent commits
> 4) Fill out nextXid and the other fields
> 5) CheckPointGuts()
> 6) Rest
>
> It's then no longer necessary for LogStandbySnapshot() do modify
> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
> will thus derive a higher value than LogStandbySnapshot() would have.

Hmm, I don't think that fully fixes the problem. Even if you're certain
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might
decide to truncate it away again while the checkpoint is running.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-10-27 07:29:21 Re: Updated version of pg_receivexlog
Previous Message Tom Lane 2011-10-27 03:36:19 Re: Hot Backup with rsync fails at pg_clog if under load