Re: Online base backup from the hot-standby

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Jun Ishiduka <ishizuka(dot)jun(at)po(dot)ntts(dot)co(dot)jp>, ssinger_pg(at)sympatico(dot)ca, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, cedric(dot)villemain(dot)debian(at)gmail(dot)com
Subject: Re: Online base backup from the hot-standby
Date: 2011-10-25 06:44:30
Message-ID: 4EA65ACE.8030904@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.10.2011 08:12, Fujii Masao wrote:
> On Tue, Oct 25, 2011 at 12:24 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 24.10.2011 15:29, Fujii Masao wrote:
>>>
>>> +<listitem>
>>> +<para>
>>> + Copy the pg_control file from the cluster directory to the global
>>> + sub-directory of the backup. For example:
>>> +<programlisting>
>>> + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
>>> +</programlisting>
>>> +</para>
>>> +</listitem>
>>
>> Why is this step required? The control file is overwritten by information
>> from the backup_label anyway, no?
>
> Yes, when recovery starts, the control file is overwritten. But before that,
> we retrieve the minimum recovery point from the control file. Then it's used
> as the backup end location.
>
> During recovery, pg_stop_backup() cannot write an end-of-backup record.
> So, in standby-only backup, other way to retrieve the backup end location
> (instead of an end-of-backup record) is required. Ishiduka-san used the
> control file as that, according to your suggestion ;)
> http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php

Oh :-)

>>> +<para>
>>> + Again connect to the database as a superuser, and execute
>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
>>> does not
>>> + perform a switch to the next WAL segment, create a backup history
>>> file and
>>> + wait for all required WAL segments to be archived,
>>> + unlike that during normal processing.
>>> +</para>
>>> +</listitem>
>>
>> How do you ensure that all the required WAL segments have been archived,
>> then?
>
> The patch doesn't provide any capability to ensure that, IOW assumes that's
> a user responsibility. If a user wants to ensure that, he/she needs to calculate
> the backup start and end WAL files from the result of pg_start_backup()
> and pg_stop_backup() respectively, and needs to wait until those files have
> appeared in the archive. Also if the required WAL file has not been archived
> yet, a user might need to execute pg_switch_xlog() in the master.

Frankly, I think this whole thing is too fragile. The procedure is
superficially similar to what you do on master: run pg_start_backup(),
rsync data directory, run pg_stop_backup(), but is actually subtly
different and more complicated. If you don't know that, and don't follow
the full procedure, you get a corrupt backup. And the backup might look
ok, and might even sometimes work, which means that you won't notice in
quick testing. That's a *huge* foot-gun.

I think we need to step back and find a way to make this:
a) less complicated, or at least
b) more robust, so that if you don't follow the procedure, you get an error.

With pg_basebackup, we have a fighting chance of getting this right,
because we have more control over how the backup is made. For example,
we can co-operate with the buffer manager to avoid torn-pages,
eliminating the need for full_page_writes=on, and we can include a
control file with the correct end-of-backup location automatically,
without requiring user intervention. pg_basebackup is less flexible than
the pg_start/stop_backup method, and unfortunately you're more likely to
need the flexibility in a more complicated setup with a hot standby
server and all, but making the generic pg_start/stop_backup method work
seems infeasible at the moment.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-10-25 07:03:31 Re: Hot Backup with rsync fails at pg_clog if under load
Previous Message Wolfgang Wilhelm 2011-10-25 06:34:23 Re: So, is COUNT(*) fast now?