Re: Hot Backup with rsync fails at pg_clog if under load

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-10-26 13:57:12
Message-ID: FD57812A-6A9B-44F3-8CDF-3CD2F8D9918E@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct26, 2011, at 15:12 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
>
>> The read fails because their is no data at the location it's trying to
>> read from, because clog hasn't been extended yet by recovery.
>
> You don't actually know that, though I agree it seems a reasonable
> guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in CA9FD2FE(dot)1D8D2%linas(dot)virbalas(at)continuent(dot)com)

2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at offset 32768: Success.

Note that it says "Success" at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not "Success".

> The error is very specifically referring to 22811359, which is the
> nextxid from pg_control and updated by checkpoint.

Where does that XID come from? The reference to that XID in the archives
that I can find is in your message
CA+U5nMKUUoA8kRG=itfsO5nZuE3x_KDJz78EaUN3_FKmq-uMJA(at)mail(dot)gmail(dot)com

> 22811359 is mid-way through a clog page, so prior xids will already
> have been allocated, pages extended and then those pages fsyncd before
> the end of pg_start_backup(). So it shouldn't be possible for that
> page to be absent from the base backup, unless the base backup was
> taken without a preceding checkpoint, which seems is not the case from
> the script output.

Or unless the nextId we store in the checkpoint is for some reason higher
than it should be. Or unless nextId somehow gets mangled during recovery.
Or unless there's some interaction between VACUUM and CHECKPOINTS that
we're overlooking...

> Note that if you are correct, then the solution is to extend clog,
> which Florian disagrees with as a solution.

That's not what I said. As you said, the CLOG page corresponding to nextId
*should* always be accessible at the start of recovery (Unless whole file
has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
Yet the error suggest that the CLOG is, in fact, too short. What I said
is that we shouldn't apply any fix (for the CLOG problem) before we understand
the reason for that apparent contradiction.

Doing it nevertheless to get rid of this seems dangerous. What happens, for
example, to the CLOG state of transactions earlier than the checkpoint's
nextId? There COMMIT record may very well lie before the checkpoint's REDO
pointer, so the CLOG we copied better contained their correct state. Yet if
it does, then why isn't the nextId's CLOG page accessible?

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aidan Van Dyk 2011-10-26 14:18:06 Re: Hot Backup with rsync fails at pg_clog if under load
Previous Message Robert Haas 2011-10-26 13:51:49 Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified