Re: Online base backup from the hot-standby

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jun Ishiduka <ishizuka(dot)jun(at)po(dot)ntts(dot)co(dot)jp>, ssinger_pg(at)sympatico(dot)ca, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com, cedric(dot)villemain(dot)debian(at)gmail(dot)com
Subject: Re: Online base backup from the hot-standby
Date: 2011-10-25 10:19:33
Message-ID: CABUevEyGvJ6bJo+MnUtGCefNYsYD-9rsCS=EZ0vMLFzhvcTF6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 25, 2011 at 10:50, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>>> +<para>
>>>>> +      Again connect to the database as a superuser, and execute
>>>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
>>>>> does not
>>>>> +      perform a switch to the next WAL segment, create a backup history
>>>>> file and
>>>>> +      wait for all required WAL segments to be archived,
>>>>> +      unlike that during normal processing.
>>>>> +</para>
>>>>> +</listitem>
>>>>
>>>> How do you ensure that all the required WAL segments have been archived,
>>>> then?
>>>
>>> The patch doesn't provide any capability to ensure that, IOW assumes
>>> that's
>>> a user responsibility. If a user wants to ensure that, he/she needs to
>>> calculate
>>> the backup start and end WAL files from the result of pg_start_backup()
>>> and pg_stop_backup() respectively, and needs to wait until those files
>>> have
>>> appeared in the archive. Also if the required WAL file has not been
>>> archived
>>> yet, a user might need to execute pg_switch_xlog() in the master.
>>
>> Frankly, I think this whole thing is too fragile. The procedure is
>> superficially similar to what you do on master: run pg_start_backup(), rsync
>> data directory, run pg_stop_backup(), but is actually subtly different and
>> more complicated. If you don't know that, and don't follow the full
>> procedure, you get a corrupt backup. And the backup might look ok, and might
>> even sometimes work, which means that you won't notice in quick testing.
>> That's a *huge* foot-gun.
>>
>> I think we need to step back and find a way to make this:
>> a) less complicated, or at least
>> b) more robust, so that if you don't follow the procedure, you get an error.
>
> One idea to make the way more robust is to change the PostgreSQL so that
> it writes the buffer page to a temporary space instead of database file
> during a backup. This means that there is no torn-pages in the database files
> of the backup. After backup, the data blocks are written back to the database
> files over time. When recovery starts from that backup(i.e., backup_label is
> found), it clears the temporary space in the backup first and continues recovery
> by using the database files which contain no torn-pages. OTOH,
> in crash recovery (i.e., backup_label is not found), recovery is performed by
> using both database files and temporary space. This whole approach would
> make the standby-only backup available even if FPW is disabled in the master
> and you don't care about the order to backup the control file.
>
> But this idea looks overkill. It seems very complicated to implement that, and
> likely to invite other bugs. I don't have any other good and simple
> idea for now.
>
>> With pg_basebackup, we have a fighting chance of getting this right, because
>> we have more control over how the backup is made. For example, we can
>> co-operate with the buffer manager to avoid torn-pages, eliminating the need
>> for full_page_writes=on, and we can include a control file with the correct
>> end-of-backup location automatically, without requiring user intervention.
>> pg_basebackup is less flexible than the pg_start/stop_backup method, and
>> unfortunately you're more likely to need the flexibility in a more
>> complicated setup with a hot standby server and all, but making the generic
>> pg_start/stop_backup method work seems infeasible at the moment.
>
> Yes, so we should give up supporting manual procedure? And extend
> pg_basebackup for the standby-only backup, first? I can live with this.

I don't think we should necessarily give up completely. But doing a
pg_basebackup way *first* seems reasonable - because it's going to be
the easiest one to "get right", given that we have more control there.
Doesn't mean we shouldn't extend it in the future...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-10-25 10:37:08 Re: Updated version of pg_receivexlog
Previous Message Magnus Hagander 2011-10-25 10:15:21 Re: pgsql_fdw, FDW for PostgreSQL server