Re: trying again to get incremental backup

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-08-31 22:50:29
Message-ID: f97e3241-a3ac-a487-0c59-0d2675bc6b07@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

On 8/30/23 10:49, Robert Haas wrote:
> In the limited time that I've had to work on this project lately, I've
> been trying to come up with a test case for this feature -- and since
> I've gotten completely stuck, I thought it might be time to post and
> see if anyone else has a better idea. I thought a reasonable test case
> would be: Do a full backup. Change some stuff. Do an incremental
> backup. Restore both backups and perform replay to the same LSN. Then
> compare the files on disk. But I cannot make this work. The first
> problem I ran into was that replay of the full backup does a
> restartpoint, while the replay of the incremental backup does not.
> That results in, for example, pg_subtrans having different contents.

pg_subtrans, at least, can be ignored since it is excluded from the
backup and not required for recovery.

> I'm not sure whether it can also result in data files having different
> contents: are changes that we replayed following the last restartpoint
> guaranteed to end up on disk when the server is shut down? It wasn't
> clear to me that this is the case. I thought maybe I could get both
> servers to perform a restartpoint at the same location by shutting
> down the primary and then replaying through the shutdown checkpoint,
> but that doesn't work because the primary doesn't finish archiving
> before shutting down. After some more fiddling I settled (at least for
> research purposes) on having the restored backups PITR and promote,
> instead of PITR and pause, so that we're guaranteed a checkpoint. But
> that just caused me to run into a far worse problem: replay on the
> standby doesn't actually create a state that is byte-for-byte
> identical to the one that exists on the primary. I quickly discovered
> that in my test case, I was ending up with different contents in the
> "hole" of a block wherein a tuple got updated. Replay doesn't think
> it's important to make the hole end up with the same contents on all
> machines that replay the WAL, so I end up with one server that has
> more junk in there than the other one and the tests fail.

This is pretty much what I discovered when investigating backup from
standby back in 2016. My (ultimately unsuccessful) efforts to find a
clean delta resulted in [1] as I systematically excluded directories
that are not required for recovery and will not be synced between a
primary and standby.

FWIW Heikki also made similar attempts at this before me (back then I
found the thread but I doubt I could find it again) and arrived at
similar results. We discussed this in person and figured out that we had
come to more or less the same conclusion. Welcome to the club!

> Unless someone has a brilliant idea that I lack, this suggests to me
> that this whole line of testing is a dead end. I can, of course, write
> tests that compare clusters *logically* -- do the correct relations
> exist, are they accessible, do they have the right contents? But I
> feel like it would be easy to have bugs that escape detection in such
> a test but would be detected by a physical comparison of the clusters.

Agreed, though a matching logical result is still very compelling.

> However, such a comparison can only be conducted if either (a) there's
> some way to set up the test so that byte-for-byte identical clusters
> can be expected or (b) there's some way to perform the comparison that
> can distinguish between expected, harmless differences and unexpected,
> problematic differences. And at the moment my conclusion is that
> neither (a) nor (b) exists. Does anyone think otherwise?

I do not. My conclusion back then was that validating a physical
comparison would be nearly impossible without changes to Postgres to
make the primary and standby match via replication. Which, by the way, I
still think would be a great idea. In principle, at least. Replay is
already a major bottleneck and anything that makes it slower will likely
not be very popular.

This would also be great for WAL, since last time I tested the same WAL
segment can be different between the primary and standby because the
unused (and recycled) portion at the end is not zeroed as it is on the
primary (but logically they do match). I would be very happy if somebody
told me that my info is out of date here and this has been fixed. But
when I looked at the code it was incredibly tricky to do this because of
how WAL is replicated.

> Meanwhile, here's a rebased set of patches. The somewhat-primitive
> attempts at writing tests are in 0009, but they don't work, for the
> reasons explained above. I think I'd probably like to go ahead and
> commit 0001 and 0002 soon if there are no objections, since I think
> those are good refactorings independently of the rest of this.

No objections to 0001/0002.

Regards,
-David

[1]
http://git.postgresql.org/pg/commitdiff/6ad8ac6026287e3ccbc4d606b6ab6116ccc0eec8

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2023-09-01 00:50:43 Re: More new SQL/JSON item methods
Previous Message Melanie Plageman 2023-08-31 22:35:19 Re: Eliminate redundant tuple visibility check in vacuum