Re: Online base backup from the hot-standby

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jun Ishiduka <ishizuka(dot)jun(at)po(dot)ntts(dot)co(dot)jp>, ssinger_pg(at)sympatico(dot)ca, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, cedric(dot)villemain(dot)debian(at)gmail(dot)com
Subject: Re: Online base backup from the hot-standby
Date: 2011-10-25 08:50:10
Message-ID: CAHGQGwFZHYwaUzT4sjXrU74LXvJm1opvMmhgtak3m081khvM=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> +<para>
>>>> +      Again connect to the database as a superuser, and execute
>>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
>>>> does not
>>>> +      perform a switch to the next WAL segment, create a backup history
>>>> file and
>>>> +      wait for all required WAL segments to be archived,
>>>> +      unlike that during normal processing.
>>>> +</para>
>>>> +</listitem>
>>>
>>> How do you ensure that all the required WAL segments have been archived,
>>> then?
>>
>> The patch doesn't provide any capability to ensure that, IOW assumes
>> that's
>> a user responsibility. If a user wants to ensure that, he/she needs to
>> calculate
>> the backup start and end WAL files from the result of pg_start_backup()
>> and pg_stop_backup() respectively, and needs to wait until those files
>> have
>> appeared in the archive. Also if the required WAL file has not been
>> archived
>> yet, a user might need to execute pg_switch_xlog() in the master.
>
> Frankly, I think this whole thing is too fragile. The procedure is
> superficially similar to what you do on master: run pg_start_backup(), rsync
> data directory, run pg_stop_backup(), but is actually subtly different and
> more complicated. If you don't know that, and don't follow the full
> procedure, you get a corrupt backup. And the backup might look ok, and might
> even sometimes work, which means that you won't notice in quick testing.
> That's a *huge* foot-gun.
>
> I think we need to step back and find a way to make this:
> a) less complicated, or at least
> b) more robust, so that if you don't follow the procedure, you get an error.

One idea to make the way more robust is to change the PostgreSQL so that
it writes the buffer page to a temporary space instead of database file
during a backup. This means that there is no torn-pages in the database files
of the backup. After backup, the data blocks are written back to the database
files over time. When recovery starts from that backup(i.e., backup_label is
found), it clears the temporary space in the backup first and continues recovery
by using the database files which contain no torn-pages. OTOH,
in crash recovery (i.e., backup_label is not found), recovery is performed by
using both database files and temporary space. This whole approach would
make the standby-only backup available even if FPW is disabled in the master
and you don't care about the order to backup the control file.

But this idea looks overkill. It seems very complicated to implement that, and
likely to invite other bugs. I don't have any other good and simple
idea for now.

> With pg_basebackup, we have a fighting chance of getting this right, because
> we have more control over how the backup is made. For example, we can
> co-operate with the buffer manager to avoid torn-pages, eliminating the need
> for full_page_writes=on, and we can include a control file with the correct
> end-of-backup location automatically, without requiring user intervention.
> pg_basebackup is less flexible than the pg_start/stop_backup method, and
> unfortunately you're more likely to need the flexibility in a more
> complicated setup with a hot standby server and all, but making the generic
> pg_start/stop_backup method work seems infeasible at the moment.

Yes, so we should give up supporting manual procedure? And extend
pg_basebackup for the standby-only backup, first? I can live with this.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru Hanada 2011-10-25 09:11:00 pgsql_fdw, FDW for PostgreSQL server
Previous Message Simon Riggs 2011-10-25 07:03:31 Re: Hot Backup with rsync fails at pg_clog if under load