Re: trying again to get incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-10-30 16:01:22
Message-ID: CA+TgmoaR8o+PBeWc_2Ge0XVgoM7xWKNyDmqXoTov=S6_J1gecQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While reviewing this thread today, I realized that I never responded
to this email. That was inadvertent; my apologies.

On Wed, Jun 14, 2023 at 4:34 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> Nice, I like this idea.

Cool.

> Skimming through the 7th patch, I see claims that FSM is not fully
> WAL-logged and thus shouldn't be tracked, and so it indeed doesn't
> track those changes.
> I disagree with that decision: we now have support for custom resource
> managers, which may use the various forks for other purposes than
> those used in PostgreSQL right now. It would be a shame if data is
> lost because of the backup tool ignoring forks because the PostgreSQL
> project itself doesn't have post-recovery consistency guarantees in
> that fork. So, unless we document that WAL-logged changes in the FSM
> fork are actually not recoverable from backup, regardless of the type
> of contents, we should still keep track of the changes in the FSM fork
> and include the fork in our backups or only exclude those FSM updates
> that we know are safe to ignore.

I'm not sure what to do about this problem. I don't think any data
would be *lost* in the scenario that you mention; what I think would
happen is that the FSM forks would be backed up in their entirety even
if they were owned by some other table AM or index AM that was
WAL-logging all changes to whatever it was storing in that fork. So I
think that there is not a correctness issue here but rather an
efficiency issue.

It would still be nice to fix that somehow, but I don't see how to do
it. It would be easy to make the WAL summarizer stop treating the FSM
as a special case, but there's no way for basebackup_incremental.c to
know whether a particular relation fork is for the heap AM or some
other AM that handles WAL-logging differently. It can't for example
examine pg_class; it's not connected to any database, let alone every
database. So we have to either trust that the WAL for the FSM is
correct and complete in all cases, or assume that it isn't in any
case. And the former doesn't seem like a safe or wise assumption given
how the heap AM works.

I think the reality here is unfortunately that we're missing a lot of
important infrastructure to really enable a multi-table-AM world. The
heap AM, and every other table AM, should include a metapage so we can
tell what we're looking at just by examining the disk files. Relation
forks don't scale and should be replaced with some better system that
does. We should have at least two table AMs in core that are fully
supported and do truly useful things. Until some of that stuff (and
probably a bunch of other things) get sorted out, out-of-core AMs are
going to have to remain second-class citizens to some degree.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-10-30 16:01:29 Re: POC, WIP: OR-clause support for indexes
Previous Message Alexander Kukushkin 2023-10-30 15:26:39 Re: pg_rewind WAL segments deletion pitfall