Re: [HACKERS] Creating backup history files for backups taken from standbys

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] Creating backup history files for backups taken from standbys
Date: 2018-03-05 17:23:19
Message-ID: CAHGQGwGS48YZfCe1CrtukMmRvYhJwbF5WwxcXUXrm6suhAXsuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 5, 2018 at 11:01 PM, David Steele <david(at)pgmasters(dot)net> wrote:
> On 3/5/18 1:06 AM, Michael Paquier wrote:
>> On Fri, Mar 02, 2018 at 03:41:57PM -0500, David Steele wrote:
>>> On 3/2/18 1:03 PM, Fujii Masao wrote:
>>>> On Fri, Mar 2, 2018 at 1:07 PM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>>>> We would talk about two backups running
>>>>> simultaneously on a standby, which would overlap with each other to
>>>>> generate a file aimed only at being helpful for debugging purposes, and
>>>>> we provide no information now for backups taken from standbys. We could
>>>>> of course make that logic a bit smarter by checking if there is an
>>>>> extsing file with the same name and create a new file with a different
>>>>> name. But is that worth the complication? That's where I am not
>>>>> convinced, and that's the reason why this patch is doing things this
>>>>> way.
>>
>> Sorry for my late reply, I was thinking more about the whole thing.
>>
>>>> What about including the backup history file in the base backup instead of
>>>> creating it in the standby's pg_wal and archiving it? As the good side effect
>>>> of this approach, we can use the backup history file for debugging purpose
>>>> even when WAL archiving is not enabled.
>>
>> That's not going to be much helpful for tar'ed base backups as the
>> backup history file would be at the end of the stream :( For a debug
>> file, that's not really helpful.
>>
>>> I don't think this is a good idea, even if it does solve the race
>>> condition. First, it would be different from the way primary-style
>>> backups generate this file. Second, these files would not be excluded
>>> from future restore / backup cycles so they could build up after a while
>>> and cause confusion. I guess we could teach PG to delete them or
>>> pg_basebackup to ignore them, but that doesn't seem worth the effort.
>>
>> Something I did not consider though is that this causes archive commands
>> to choke on those identical file names, so any archive command which is
>> not able to handle that would cause WAL to bloat on the standby. For
>> this reason I would rather drop the current approach taken. Remember
>> that the docs tell to use a plain "cp" for example, so this would cause
>> standbys to go silently down.
>
> That's a good point. The pgBackRest archive command would definitely
> not like this.
>
>> Something that I would find way more portable though is the addition of
>> the backup history file in the output of pg_stop_backup(), which would
>> map as well with what Fujii-san's idea to add this file to the backup
>> data itself, and do the same for backups taken on a primary. However
>> this would mean moving the 'c' marker marking the end of the tar stream
>> within do_pg_stop_backup(), and I am in no way a fan of that: the
>> handling of the tar stream should be out of reach of the start and stop
>> phases.
>
> If the file is stored with the backup instead of the archive then I
> don't think it adds much. Why not just add stop time and stop LSN to
> backup_label and get rid of the history file entirely.
>
> I've been working closely with backup for nearly five years now and I've
> need had need to look at a .backup file. Other than writing code to
> handle it in archiving, I've barely given it a thought.
>
>> Another idea that I have here as well would be to change a bit the
>> history backup file name so as the backup name is added to it, but that
>> does not fly high as pg_basebackup uses its own name with spaces
>> (Yeah!), or even better the PID of the backend taking the backup.
>
> This could work, but seems complicated for little benefit. It would
> also require some archive commands to be updated to understand the new
> format.
>
>> The renaming is more appealing, still not perfect. So my mood of the
>> day would be to just drop the patch entirely.
>
> I think that's the best idea for now. It doesn't seem worth using
> cycles in the last CF coming up with a new approach.

I have no objection to mark the patch "returned with feedback".

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-03-05 17:25:39 Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
Previous Message Douglas Doole 2018-03-05 17:21:23 Re: INOUT parameters in procedures