Re: Hot Backup with rsync fails at pg_clog if under load

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "daniel(at)heroku(dot)com" <daniel(at)heroku(dot)com>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-09-21 16:34:24
Message-ID: F363F283-B85C-47E0-AB8E-F69572C1738B@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sep21, 2011, at 16:44 , Linas Virbalas wrote:
> After searching the archives, the only more discussed and similar issue I
> found hit was by Daniel Farina in a thread "hot backups: am I doing it
> wrong, or do we have a problem with pg_clog?" [2], but, it seems, the issue
> was discarded because of a non-standard backup procedure Deniel used.

That's not the way I read that thread. In fact, Robert Haas confirmed that
Daniel's backup procedure was sound in theory. The open question was whether the
error occurred because of a Bug in Daniel's backup code or postgresql's restore
code. The thread then petered out without that question being answered.

> Procedure:
>
> 1. Start load generator on the master (WAL archiving enabled).
> 2. Prepare a Streaming Replication standby (accepting WAL files too):
> 2.1. pg_switch_xlog() on the master;
> 2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
> while as master is loaded up);
> 2.3. rsync data/global/pg_control to the standby;
> 2.4. rsync all other data/ (without pg_xlog) to the standby;
> 2.5. pg_stop_backup() on the master;
> 2.6. Wait to receive all WAL files, generated during the backup, on the
> standby;
> 2.6. Start the standby PG instance.

Looks good. (2.1) and (2.3) seem redundant (as Euler already noticed),
but shouldn't cause any errors.

Could you provide us with the exact rsync version and parameters you use?

> The last step will, usually, fail with a similar error:
>
> 2011-09-21 13:41:05 CEST LOG: database system was interrupted; last known
> up at 2011-09-21 13:40:50 CEST
> Restoring 00000014.history
> mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
> or directory
> Restoring 00000013.history
> 2011-09-21 13:41:05 CEST LOG: restored log file "00000013.history" from
> archive
> 2011-09-21 13:41:05 CEST LOG: entering standby mode
> Restoring 0000001300000006000000DC
> 2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DC"
> from archive
> Restoring 0000001300000006000000DB
> 2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DB"
> from archive
> 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction
> 1188673
> 2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at
> offset 32768: Success.

Whats the size of the file (pg_clog/0001) on both the master and the slave?

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2011-09-21 16:35:41 Re: Inlining comparators as a performance optimisation
Previous Message Daniel Vázquez 2011-09-21 16:28:53 unaccent contrib