Re: [HACKERS] Creating backup history files for backups taken from standbys

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] Creating backup history files for backups taken from standbys
Date: 2018-03-05 14:01:23
Message-ID: 0eefbbe4-8b81-91bc-8a18-0237f139277a@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/5/18 1:06 AM, Michael Paquier wrote:
> On Fri, Mar 02, 2018 at 03:41:57PM -0500, David Steele wrote:
>> On 3/2/18 1:03 PM, Fujii Masao wrote:
>>> On Fri, Mar 2, 2018 at 1:07 PM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>>> We would talk about two backups running
>>>> simultaneously on a standby, which would overlap with each other to
>>>> generate a file aimed only at being helpful for debugging purposes, and
>>>> we provide no information now for backups taken from standbys. We could
>>>> of course make that logic a bit smarter by checking if there is an
>>>> extsing file with the same name and create a new file with a different
>>>> name. But is that worth the complication? That's where I am not
>>>> convinced, and that's the reason why this patch is doing things this
>>>> way.
>
> Sorry for my late reply, I was thinking more about the whole thing.
>
>>> What about including the backup history file in the base backup instead of
>>> creating it in the standby's pg_wal and archiving it? As the good side effect
>>> of this approach, we can use the backup history file for debugging purpose
>>> even when WAL archiving is not enabled.
>
> That's not going to be much helpful for tar'ed base backups as the
> backup history file would be at the end of the stream :( For a debug
> file, that's not really helpful.
>
>> I don't think this is a good idea, even if it does solve the race
>> condition. First, it would be different from the way primary-style
>> backups generate this file. Second, these files would not be excluded
>> from future restore / backup cycles so they could build up after a while
>> and cause confusion. I guess we could teach PG to delete them or
>> pg_basebackup to ignore them, but that doesn't seem worth the effort.
>
> Something I did not consider though is that this causes archive commands
> to choke on those identical file names, so any archive command which is
> not able to handle that would cause WAL to bloat on the standby. For
> this reason I would rather drop the current approach taken. Remember
> that the docs tell to use a plain "cp" for example, so this would cause
> standbys to go silently down.

That's a good point. The pgBackRest archive command would definitely
not like this.

> Something that I would find way more portable though is the addition of
> the backup history file in the output of pg_stop_backup(), which would
> map as well with what Fujii-san's idea to add this file to the backup
> data itself, and do the same for backups taken on a primary. However
> this would mean moving the 'c' marker marking the end of the tar stream
> within do_pg_stop_backup(), and I am in no way a fan of that: the
> handling of the tar stream should be out of reach of the start and stop
> phases.

If the file is stored with the backup instead of the archive then I
don't think it adds much. Why not just add stop time and stop LSN to
backup_label and get rid of the history file entirely.

I've been working closely with backup for nearly five years now and I've
need had need to look at a .backup file. Other than writing code to
handle it in archiving, I've barely given it a thought.

> Another idea that I have here as well would be to change a bit the
> history backup file name so as the backup name is added to it, but that
> does not fly high as pg_basebackup uses its own name with spaces
> (Yeah!), or even better the PID of the backend taking the backup.

This could work, but seems complicated for little benefit. It would
also require some archive commands to be updated to understand the new
format.

> The renaming is more appealing, still not perfect. So my mood of the
> day would be to just drop the patch entirely.

I think that's the best idea for now. It doesn't seem worth using
cycles in the last CF coming up with a new approach.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-03-05 14:02:07 Re: perltidy version
Previous Message Alvaro Herrera 2018-03-05 13:53:44 Re: perltidy version