Re: trying again to get incremental backup

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-12-13 10:39:22
Message-ID: CAKZiRmygUFo5Fa4y9cUwuhZNEJpP6Ng8AwbQDkYGuRAmK23zvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

On Mon, Dec 11, 2023 at 6:08 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Dec 8, 2023 at 5:02 AM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> > While we are at it, maybe around the below in PrepareForIncrementalBackup()
> >
> > if (tlep[i] == NULL)
> > ereport(ERROR,
> >
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > errmsg("timeline %u found in
> > manifest, but not in this server's history",
> > range->tli)));
> >
> > we could add
> >
> > errhint("You might need to start a new full backup instead of
> > incremental one")
> >
> > ?
>
> I can't exactly say that such a hint would be inaccurate, but I think
> the impulse to add it here is misguided. One of my design goals for
> this system is to make it so that you never have to take a new
> incremental backup "just because,"

Did you mean take a new full backup here?

> not even in case of an intervening
> timeline switch. So, all of the errors in this function are warning
> you that you've done something that you really should not have done.
> In this particular case, you've either (1) manually removed the
> timeline history file, and not just any timeline history file but the
> one for a timeline for a backup that you still intend to use as the
> basis for taking an incremental backup or (2) tried to use a full
> backup taken from one server as the basis for an incremental backup on
> a completely different server that happens to share the same system
> identifier, e.g. because you promoted two standbys derived from the
> same original primary and then tried to use a full backup taken on one
> as the basis for an incremental backup taken on the other.
>

Okay, but please consider two other possibilities:

(3) I had a corrupted DB where I've fixed it by running pg_resetwal
and some cronjob just a day later attempted to take incremental and
failed with that error.

(4) I had pg_upgraded (which calls pg_resetwal on fresh initdb
directory) the DB where I had cronjob that just failed with this error

I bet that (4) is going to happen more often than (1), (2) , which
might trigger users to complain on forums, support tickets.

> > > I have a fix for this locally, but I'm going to hold off on publishing
> > > a new version until either there's a few more things I can address all
> > > at once, or until Thomas commits the ubsan fix.
> > >
> >
> > Great, I cannot get it to fail again today, it had to be some dirty
> > state of the testing env. BTW: Thomas has pushed that ubsan fix.
>
> Huzzah, the cfbot likes the patch set now. Here's a new version with
> the promised fix for your non-reproducible issue. Let's see whether
> you and cfbot still like this version.

LGTM, all quick tests work from my end too. BTW: I have also scheduled
the long/large pgbench -s 14000 (~200GB?) - multiple day incremental
test. I'll let you know how it went.

-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-12-13 10:51:30 Re: Report planning memory in EXPLAIN ANALYZE
Previous Message Corey Huinker 2023-12-13 10:26:04 Re: Statistics Import and Export