Re: Hot Backup with rsync fails at pg_clog if under load

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-09-27 22:53:10
Message-ID: E75A2E54-8E6E-4ADE-B6D8-E1FEEDFEF3A6@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sep23, 2011, at 21:10 , Robert Haas wrote:
> So the actual error message in the last test was:
>
> 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673
>
> ...but we can't tell if that was before or after nextXid, which seems
> like it would be useful to know.
>
> If Linas can rerun his experiment, but also capture the output of
> pg_controldata before firing up the standby for the first time, then
> we'd able to see that information.

Hm, wouldn't pg_controldata quite certainly show a nextId beyond the clog
if copied after pg_clog/*?

Linas, could you capture the output of pg_controldata *and* increase the
log level to DEBUG1 on the standby? We should then see nextXid value of
the checkpoint the recovery is starting from.

FWIW, I've had a few more theories about what's going on, but none survived
after looking at the code. My first guess was that there maybe are circumstances
under which the nextId from the control file, instead of the one from the
pre-backup checkpoint, ends up becoming the standby's nextXid. But there doesn't
seem to be a way for that to happen.

My next theory was that something increments nextIdx before StartupCLOG().
The only possible candidate seems to be PrescanPreparedTransactions(), which
does increment nextXid if it's smaller than some sub-xid of a prepared xact.
But we only call that before StartupCLOG() if we're starting from a
shutdown checkpoint, which shouldn't be the case for the OP.

I also checked what rsync does when a file vanishes after rsync computed the
file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
exit code for precisely that failure case.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-09-27 23:05:51 Re: [PATCH] Log crashed backend's query v2
Previous Message Tom Lane 2011-09-27 22:30:58 Re: contrib/sepgsql regression tests are a no-go