Re: trying again to get incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-12-11 17:08:20
Message-ID: CA+TgmoYUhrgcNin=qrY+J+S-f6kctf9DaUKoaD-e3cww_ox9vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 8, 2023 at 5:02 AM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> While we are at it, maybe around the below in PrepareForIncrementalBackup()
>
> if (tlep[i] == NULL)
> ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("timeline %u found in
> manifest, but not in this server's history",
> range->tli)));
>
> we could add
>
> errhint("You might need to start a new full backup instead of
> incremental one")
>
> ?

I can't exactly say that such a hint would be inaccurate, but I think
the impulse to add it here is misguided. One of my design goals for
this system is to make it so that you never have to take a new
incremental backup "just because," not even in case of an intervening
timeline switch. So, all of the errors in this function are warning
you that you've done something that you really should not have done.
In this particular case, you've either (1) manually removed the
timeline history file, and not just any timeline history file but the
one for a timeline for a backup that you still intend to use as the
basis for taking an incremental backup or (2) tried to use a full
backup taken from one server as the basis for an incremental backup on
a completely different server that happens to share the same system
identifier, e.g. because you promoted two standbys derived from the
same original primary and then tried to use a full backup taken on one
as the basis for an incremental backup taken on the other.

The scenario I was really concerned about when I wrote this test was
(2), because that could lead to a corrupt restore. This test isn't
strong enough to prevent that completely, because two unrelated
standbys can branch onto the same new timelines at the same LSNs, and
then these checks can't tell that something bad has happened. However,
they can detect a useful subset of problem cases. And the solution is
not so much "take a new full backup" as "keep straight which server is
which." Likewise, in case (1), the relevant hint would be "don't
manually remove timeline history files, and if you must, then at least
don't nuke timelines that you actually still care about."

> > I have a fix for this locally, but I'm going to hold off on publishing
> > a new version until either there's a few more things I can address all
> > at once, or until Thomas commits the ubsan fix.
> >
>
> Great, I cannot get it to fail again today, it had to be some dirty
> state of the testing env. BTW: Thomas has pushed that ubsan fix.

Huzzah, the cfbot likes the patch set now. Here's a new version with
the promised fix for your non-reproducible issue. Let's see whether
you and cfbot still like this version.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v14-0004-Add-new-pg_walsummary-tool.patch application/octet-stream 17.8 KB
v14-0001-Move-src-bin-pg_verifybackup-parse_manifest.c-in.patch application/octet-stream 4.3 KB
v14-0005-Test-patch-Enable-summarize_wal-by-default.patch application/octet-stream 4.7 KB
v14-0002-Add-a-new-WAL-summarizer-process.patch application/octet-stream 135.9 KB
v14-0003-Add-support-for-incremental-backup.patch application/octet-stream 222.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2023-12-11 18:59:32 Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Previous Message Peter Geoghegan 2023-12-11 16:16:07 Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)