Re: pg_combinebackup fails on file named INCREMENTAL.*

From: Stefan Fercot <stefan(dot)fercot(at)protonmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: pg_combinebackup fails on file named INCREMENTAL.*
Date: 2024-04-16 07:09:55
Message-ID: vJnnuiaye5rNnCPN8h3xN1Y3lyUDESIgEQnR-Urat9_ld_fozShSJbEk8JDM_3K6BVt5HXT-CatWpSfEZkYVeymlrxKO2_kfKmVZNWyCuJc=@protonmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> I think it would be reasonable to restrict what can be put in base/ and
> global/ but users generally feel free to create whatever they want in
> the root of PGDATA, despite being strongly encouraged not to.
>
> Anyway, I think it should be fixed or documented as a caveat since it
> causes a hard failure on restore.

+1. IMHO, no matter how you'd further decide to reference the incremental stubs, the earlier we can mention the failure to the user, the better.
Tbh, I'm not very confortable seeing pg_combinebackup fail when the user would need to restore (whatever the reason). I mean, failing to take the backup is less of a problem (compared to failing to restore) because then the user might have the time to investigate and fix the issue, not being in the hurry of a down production to restore...

> > But ... I didn't really end up feeling very comfortable with it. Right
> > now, the backup manifest is something we only use to verify the
> > integrity of the backup. If we were to do this, it would become a
> > critical part of the backup.

Isn't it already the case? I mean, you need the manifest of the previous backup to take an incremental one, right?
And shouldn't we encourage to verify the backup sets before (at least) trying to combine them?
It's not because a file was only use for one specific purpose until now that we can't improve it later.
Splitting the meaningful information across multiple places would be more error-prone (for both devs and users) imo.

> > I don't think I like the idea that
> > removing the backup_manifest should be allowed to, in effect, corrupt
> > the backup. But I think I dislike even more the idea that the data
> > that is used to verify the backup gets mushed together with the backup
> > data itself. Maybe in practice it's fine, but it doesn't seem very
> > conceptually clean.

Removing pretty much any file in the backup set would probably corrupt it. I mean, why would people remove the backup manifest when they already can't remove backup_label?
A doc stating "dont touch anything" is IMHO easier than "this part is critical, not that one".

> I don't think this is a problem. The manifest can do more than store
> verification info, IMO.

+1. Adding more info to the backup manifest can be handy for other use-cases too (i.e. like avoiding to store empty files, or storing the checksums state of the cluster).

Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Quan Zongliang 2024-04-16 07:15:57 Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
Previous Message Pavel Luzanov 2024-04-16 07:06:26 Re: Things I don't like about \du's "Attributes" column