Skip site navigation (1) Skip section navigation (2)

Re: Hot Backup with rsync fails at pg_clog if under load

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-09-27 22:53:10
Message-ID: E75A2E54-8E6E-4ADE-B6D8-E1FEEDFEF3A6@phlo.org (view raw or flat)
Thread:
Lists: pgsql-hackers
On Sep23, 2011, at 21:10 , Robert Haas wrote:
> So the actual error message in the last test was:
> 
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
> 
> ...but we can't tell if that was before or after nextXid, which seems
> like it would be useful to know.
> 
> If Linas can rerun his experiment, but also capture the output of
> pg_controldata before firing up the standby for the first time, then
> we'd able to see that information.

Hm, wouldn't pg_controldata quite certainly show a nextId beyond the clog
if copied after pg_clog/*?

Linas, could you capture the output of pg_controldata *and* increase the
log level to DEBUG1 on the standby? We should then see nextXid value of
the checkpoint the recovery is starting from.

FWIW, I've had a few more theories about what's going on, but none survived
after looking at the code. My first guess was that there maybe are circumstances
under which the nextId from the control file, instead of the one from the
pre-backup checkpoint, ends up becoming the standby's nextXid. But there doesn't
seem to be a way for that to happen.

My next theory was that something increments nextIdx before StartupCLOG().
The only possible candidate seems to be PrescanPreparedTransactions(), which
does increment nextXid if it's smaller than some sub-xid of a prepared xact.
But we only call that before StartupCLOG() if we're starting from a
shutdown checkpoint, which shouldn't be the case for the OP.

I also checked what rsync does when a file vanishes after rsync computed the
file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
exit code for precisely that failure case.

best regards,
Florian Pflug


In response to

Responses

pgsql-hackers by date

Next:From: Florian PflugDate: 2011-09-27 23:05:51
Subject: Re: [PATCH] Log crashed backend's query v2
Previous:From: Tom LaneDate: 2011-09-27 22:30:58
Subject: Re: contrib/sepgsql regression tests are a no-go

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group