Re: Online base backup from the hot-standby

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Jun Ishiduka <ishizuka(dot)jun(at)po(dot)ntts(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, ssinger_pg(at)sympatico(dot)ca, cedric(dot)villemain(dot)debian(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Online base backup from the hot-standby
Date: 2011-09-21 08:34:26
Message-ID: CABUevExSvk60bXz0Xh+h1B0k687Q15tOfxj8czuqGxYYQcbAxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 21, 2011 at 08:23, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Wed, Sep 21, 2011 at 04:50, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> 3. Copy the pg_control file from the cluster directory on the standby to
>>>    the backup as follows:
>>>
>>>    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
>>
>> But this is done as part of step 2 already. I assume what this really
>> means is that the pg_control file must be the last file backed up?
>
> Yes.
>
> When we perform an archive recovery from the backup taken during
> normal processing, we gets a backup end location from the backup-end
> WAL record which was written by pg_stop_backup(). But since no WAL
> writing is allowed during recovery, pg_stop_backup() on the standby
> cannot write a backup-end WAL record. So, in his patch, instead of
> a backup-end WAL record, the startup process uses the minimum
> recovery point recorded in pg_control which has been included in the
> backup, as a backup end location. BTW, a backup end location is
> used to check whether recovery has reached a consistency state
> (i.e., end-of-backup).
>
> To use the minimum recovery point in pg_control as a backup end
> location safely, pg_control must be backed up last. Otherwise, data
> page which has the newer LSN than the minimum recovery point
> might be included in the backup.

Ah, check.

>> (Since there are certainly a lot other ways to do the backup than just
>> cp to a mounted directory..)
>
> Yes. The above command I described is just an example.

ok.

>>> 4. Execute pg_stop_backup on the standby.
>>>
>>> The backup taken by the above procedure is available for an archive
>>> recovery or standby server.
>>>
>>> If the standby is promoted during a backup, pg_stop_backup() detects
>>> the change of the server status and fails. The data backed up before the
>>> promotion is invalid and not available for recovery.
>>>
>>> Taking a backup from the standby by using pg_basebackup is still not
>>> possible. But we can relax that restriction after applying this patch.
>>
>> I think that this is going to be very important, particularly given
>> the requirements on pt 3 above. (But yes, it certainly doesn't have to
>> be done as part of this patch, but it really should be the plan to
>> have this included in the same version)
>
> Agreed.
>
>>> To take a base backup during recovery safely, some sort of parameters
>>> must be set properly. Hot standby must be enabled on the standby, i.e.,
>>> wal_level and hot_standby must be enabled on the master and the standby,
>>> respectively. FPW (full page writes) is required for a base backup,
>>> so full_page_writes must be enabled on the master.
>>
>> Presumably pg_start_backup() will check this. And we'll somehow track
>> this before pg_stop_backup() as well? (for such evil things such as
>> the user changing FPW from on to off and then back to on again during
>> a backup, will will make it look correct both during start and stop,
>> but incorrect in the middle - pg_stop_backup needs to fail in that
>> case as well)
>
> Right. As I suggested upthread, to address that problem, we need to log
> the change of FPW on the master, and then we need to check whether
> such a WAL is replayed on the standby during the backup. If it's done,
> pg_stop_backup() should emit an error.

I somehow missed this thread completely, so I didn't catch your
previous comments - oops, sorry. The important point being that we
need to track if when this happens even if it has been reset to a
valid value. So we can't just check the state of the variable at the
beginning and at the end.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-09-21 11:24:03 Re: Range Types - typo + NULL string constructor
Previous Message Peter Geoghegan 2011-09-21 07:38:24 Re: Inlining comparators as a performance optimisation