Re: Hot Backup with rsync fails at pg_clog if under load

From: Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-09-22 14:24:50
Message-ID: CAA11FE2.1DDE2%linas.virbalas@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> 2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a
>>> while as master is loaded up);
>>
>> No. if you use pg_start_backup('foo', true) it will be fast. Check the
>> manual.
>
> If the server is sufficiently heavily loaded that a checkpoint takes a
> nontrivial amount of time, the OP is correct that this will be not
> fast, regardless of whether you choose to force an immediate
> checkpoint.

In order to check more cases, I have changed the procedure to force an
immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With
the same load generator running, pg_start_backup returned almost
instantaneously compared to how long it took previously.

Most importantly, after doing this change, I cannot reproduce the pg_clog
error message anymore. In other words, with immediate checkpoint hot backup
succeeds under this load!

>>> 2.3. rsync data/global/pg_control to the standby;
>>
>> Why are you doing this? If ...
>>
>>> 2.4. rsync all other data/ (without pg_xlog) to the standby;
>>
>> you will copy it again or no? Don't understand your point.
>
> His point is that exercising the bug depends on doing the copying in a
> certain order. Any order of copying the data theoretically ought to
> be OK, as long as it's all between starting the backup and stopping
> the backup, but apparently it isn't.

Please note that in the past I was able to reproduce the same pg_clog error
even with taking a singular rsync of the whole data/ folder (i.e. without
splitting it into two steps).

>> The problem could be that the minimum recovery point (step 2.3) is different
>> from the end of rsync if you are under load.

Do you have ideas why does the Hot Backup operation with
pg_start_backup('backup_under_load', true) succeed while
pg_start_backup('backup_under_load') fails under the same load?

Originally, I was using pg_start_backup('backup_under_load') in order not to
clog the master server during the I/O required for the checkpoint. Of
course, now, it seems, this should be sacrificed for the sake of a
successful backup under load.

> It seems pretty clear that some relevant chunk of WAL isn't getting
> replayed, but it's not at all clear to me why not. It seems like it
> would be useful to compare the LSN returned by pg_start_backup() with

If needed, I could do that, if I had the exact procedure... Currently,
during the start of the backup I take the following information:

pg_xlogfile_name(pg_start_backup(...))

> the location at which replay begins when you fire up the clone.

As you have seen in my original message, in the pg_log I get only the
restored WAL file names after starting up the standby. Can I tune the
postgresql.conf to include the location at which replay begins in the log?

> Could you provide us with the exact rsync version and parameters you use?

rsync -azv
version 2.6.8 protocol version 29

--
Sincerely,
Linas Virbalas
http://flyingclusters.blogspot.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2011-09-22 14:28:53 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Kerem Kat 2011-09-22 14:03:26 Re: Adding CORRESPONDING to Set Operations