Re: post-freeze damage control

From: Stefan Fercot <stefan(dot)fercot(at)protonmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: post-freeze damage control
Date: 2024-04-16 08:47:14
Message-ID: lwXoqQdOT9Nw1tJIx_h7WuqMKrB1YMePQY99RFTZ87H7V52mgUJaSlw2WRbcOgKNUurF1yJqX3nqtZi4hJhtd3e_XlmLsLvnEtGXY-fZPoA=@protonmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Saturday, April 13th, 2024 at 12:18 PM, Tomas Vondra wrote:
> On 4/13/24 01:03, David Steele wrote:
> > On 4/12/24 22:12, Tomas Vondra wrote:
> > > On 4/11/24 23:48, David Steele wrote:
> > > > On 4/11/24 20:26, Tomas Vondra wrote:
> > > >
> > > > > FWIW that discussion also mentions stuff that I think the feature
> > > > > should
> > > > > not do. In particular, I don't think the ambition was (or should be) to
> > > > > make pg_basebackup into a stand-alone tool. I always saw pg_basebackup
> > > > > more as an interface to "backup steps" correctly rather than a complete
> > > > > backup solution that'd manage backup registry, retention, etc.
> > > >
> > > > Right -- this is exactly my issue. pg_basebackup was never easy to use
> > > > as a backup solution and this feature makes it significantly more
> > > > complicated. Complicated enough that it would be extremely difficult for
> > > > most users to utilize in a meaningful way.

pg_basebackup has its own use-cases IMHO. It is very handy to simply take a copy of the running cluster, thanks to its ability to carry on the needed WAL segs without the user even needing to think about archive_command/pg_receivewal. And for this kind of tasks, the new incremental thing will not really be interesting and won't make things more complicated.

I totally agree that we don't have any complete backup solution in core though. And adding more tools to the picture (pg_basebackup, pg_combinebackup, pg_receivewal, pg_verifybackup,...) will increase the need of on-top orchestration. But that's not new. And for people already having such orchestration, having the incremental feature will help.

> > > Perhaps, I agree we could/should try to do better job to do backups, no
> > > argument there. But I still don't quite see why introducing such
> > > infrastructure to "manage backups" should be up to the patch adding
> > > incremental backups. I see it as something to build on top of
> > > pg_basebackup/pg_combinebackup, not into those tools.
> >
> > I'm not saying that managing backups needs to be part of pg_basebackup,
> > but I am saying without that it is not a complete backup solution.
> > Therefore introducing advanced features that the user then has to figure
> > out how to manage puts a large burden on them. Implementing
> > pg_combinebackup inefficiently out of the gate just makes their life
> > harder.
>
> I agree with this in general, but I fail to see how it'd be the fault of
> this patch. It merely extends what pg_basebackup did before, so if it's
> not a complete solution now, it wasn't a complete solution before.

+1. We can see it as a step to having a better backup solution in core for the future, but we definitely shouldn't rule out the fact that lots of people already developed such orchestration (home-made or relying to pgbarman, pgbackrest, wal-g,...). IMHO, if we're trying to extend the in core features, we should also aim at giving more lights and features for those tools (like adding more fields to the backup functions,...).

> > > Sure, I'm not against making it clearer pg_combinebackup is not a
> > > complete backup solution, and documenting the existing restrictions.
> >
> > Let's do that then. I think it would make sense to add caveats to the
> > pg_combinebackup docs including space requirements, being explicit about
> > the format required (e.g. plain), and also possible mitigation with COW
> > filesystems.
>
> OK. I'll add this as an open item, for me and Robert.

Thanks for this! It's probably not up to core docs to state all the steps that would be needed to develop a complete backup solution but documenting the links between the tools and mostly all the caveats (the "don't use INCREMENTAL.* filenames",...) will help users not be caught off guard. And as I mentioned in [1], IMO the earlier we can catch a potential issue (wrong filename, missing file,...), the better for the user.

Thank you all for working on this.
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

[1] https://www.postgresql.org/message-id/vJnnuiaye5rNnCPN8h3xN1Y3lyUDESIgEQnR-Urat9_ld_fozShSJbEk8JDM_3K6BVt5HXT-CatWpSfEZkYVeymlrxKO2_kfKmVZNWyCuJc%3D%40protonmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Fercot 2024-04-16 08:55:08 Re: Add recovery to pg_control and remove backup_label
Previous Message shveta malik 2024-04-16 08:36:45 Re: promotion related handling in pg_sync_replication_slots()