Quick Links

Re: trying again to get incremental backup

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: trying again to get incremental backup
Date:	2023-10-23 19:34:28
Message-ID:	CA+Tgmoad7igbt46K+JHG6UEZ56Y5SVBYq5e7OjcKkgD3SStNmg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 20, 2023 at 9:20 AM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> Okay, so another good news - related to the patch version #4.
> Not-so-tiny stress test consisting of pgbench run for 24h straight
> (with incremental backups every 2h, with base of initial full backup),
> followed by two PITRs (one not using incremental backup and one using
> to to illustrate the performance point - and potentially spot any
> errors in between). In both cases it worked fine.

This is great testing, thanks. What might be even better is to test
whether the resulting backups are correct, somehow.

> I've just noticed one thing when recovery is progress: is
> summarization working during recovery - in the background - an
> expected behaviour? I'm wondering about that, because after freshly
> restored and recovered DB, one would need to create a *new* full
> backup and only from that point new summaries would have any use?

Actually, I think you could take an incremental backup relative to a
full backup from a previous timeline.

But the question of what summarization ought to do (or not do) during
recovery, and whether it ought to be enabled by default, and what the
retention policy ought to be are very much live ones. Right now, it's
enabled by default and keeps summaries for a week, assuming you don't
reset your local clock and that it advances at the same speed as the
universe's own clock. But that's all debatable. Any views?

Meanwhile, here's a new patch set. I went ahead and committed the
first two preparatory patches, as I said earlier that I intended to
do. And here I've adjusted the main patch, which is now 0003, for the
addition of XLOG_CHECKPOINT_REDO, which permitted me to simplify a few
things. wal_summarize_mb now feels like a bit of a silly GUC --
presumably you'd never care, unless you had an absolutely gigantic
inter-checkpoint WAL distance. And if you have that, maybe you should
also have enough memory to summarize all that WAL. Or maybe not:
perhaps it's better to write WAL summaries more than once per
checkpoint when checkpoints are really big. But I'm worried that the
GUC will become a source of needless confusion for users. For most
people, it seems like emitting one summary per checkpoint should be
totally fine, and they might prefer a simple Boolean GUC,
summarize_wal = true | false, over this. I'm just not quite sure about
the corner cases.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v6-0004-Add-new-pg_walsummary-tool.patch	application/octet-stream	11.4 KB
v6-0002-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patch	application/octet-stream	4.3 KB
v6-0001-Change-how-a-base-backup-decides-which-files-have.patch	application/octet-stream	11.4 KB
v6-0003-Prototype-patch-for-incremental-backup.patch	application/octet-stream	333.5 KB

In response to

Re: trying again to get incremental backup at 2023-10-20 13:20:10 from Jakub Wartak

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2023-10-23 20:08:01	Re: recovery modules
Previous Message	Pavel Stehule	2023-10-23 18:49:21	Re: PostgreSQL domains and NOT NULL constraint