On Oct26, 2011, at 15:12 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
>> The read fails because their is no data at the location it's trying to
>> read from, because clog hasn't been extended yet by recovery.
> You don't actually know that, though I agree it seems a reasonable
> guess and was my first thought also.
The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in CA9FD2FE(dot)1D8D2%linas(dot)virbalas(at)continuent(dot)com)
2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at offset 32768: Success.
Note that it says "Success" at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not "Success".
> The error is very specifically referring to 22811359, which is the
> nextxid from pg_control and updated by checkpoint.
Where does that XID come from? The reference to that XID in the archives
that I can find is in your message
> 22811359 is mid-way through a clog page, so prior xids will already
> have been allocated, pages extended and then those pages fsyncd before
> the end of pg_start_backup(). So it shouldn't be possible for that
> page to be absent from the base backup, unless the base backup was
> taken without a preceding checkpoint, which seems is not the case from
> the script output.
Or unless the nextId we store in the checkpoint is for some reason higher
than it should be. Or unless nextId somehow gets mangled during recovery.
Or unless there's some interaction between VACUUM and CHECKPOINTS that
> Note that if you are correct, then the solution is to extend clog,
> which Florian disagrees with as a solution.
That's not what I said. As you said, the CLOG page corresponding to nextId
*should* always be accessible at the start of recovery (Unless whole file
has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
Yet the error suggest that the CLOG is, in fact, too short. What I said
is that we shouldn't apply any fix (for the CLOG problem) before we understand
the reason for that apparent contradiction.
Doing it nevertheless to get rid of this seems dangerous. What happens, for
example, to the CLOG state of transactions earlier than the checkpoint's
nextId? There COMMIT record may very well lie before the checkpoint's REDO
pointer, so the CLOG we copied better contained their correct state. Yet if
it does, then why isn't the nextId's CLOG page accessible?
In response to
pgsql-hackers by date
|Next:||From: Aidan Van Dyk||Date: 2011-10-26 14:18:06|
|Subject: Re: Hot Backup with rsync fails at pg_clog if under load|
|Previous:||From: Robert Haas||Date: 2011-10-26 13:51:49|
|Subject: Re: TOAST versus VACUUM, or "missing chunk number 0 for
toast value" identified|