Re: trying again to get incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-10-23 19:34:28
Message-ID: CA+Tgmoad7igbt46K+JHG6UEZ56Y5SVBYq5e7OjcKkgD3SStNmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 20, 2023 at 9:20 AM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> Okay, so another good news - related to the patch version #4.
> Not-so-tiny stress test consisting of pgbench run for 24h straight
> (with incremental backups every 2h, with base of initial full backup),
> followed by two PITRs (one not using incremental backup and one using
> to to illustrate the performance point - and potentially spot any
> errors in between). In both cases it worked fine.

This is great testing, thanks. What might be even better is to test
whether the resulting backups are correct, somehow.

> I've just noticed one thing when recovery is progress: is
> summarization working during recovery - in the background - an
> expected behaviour? I'm wondering about that, because after freshly
> restored and recovered DB, one would need to create a *new* full
> backup and only from that point new summaries would have any use?

Actually, I think you could take an incremental backup relative to a
full backup from a previous timeline.

But the question of what summarization ought to do (or not do) during
recovery, and whether it ought to be enabled by default, and what the
retention policy ought to be are very much live ones. Right now, it's
enabled by default and keeps summaries for a week, assuming you don't
reset your local clock and that it advances at the same speed as the
universe's own clock. But that's all debatable. Any views?

Meanwhile, here's a new patch set. I went ahead and committed the
first two preparatory patches, as I said earlier that I intended to
do. And here I've adjusted the main patch, which is now 0003, for the
addition of XLOG_CHECKPOINT_REDO, which permitted me to simplify a few
things. wal_summarize_mb now feels like a bit of a silly GUC --
presumably you'd never care, unless you had an absolutely gigantic
inter-checkpoint WAL distance. And if you have that, maybe you should
also have enough memory to summarize all that WAL. Or maybe not:
perhaps it's better to write WAL summaries more than once per
checkpoint when checkpoints are really big. But I'm worried that the
GUC will become a source of needless confusion for users. For most
people, it seems like emitting one summary per checkpoint should be
totally fine, and they might prefer a simple Boolean GUC,
summarize_wal = true | false, over this. I'm just not quite sure about
the corner cases.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v6-0004-Add-new-pg_walsummary-tool.patch application/octet-stream 11.4 KB
v6-0002-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patch application/octet-stream 4.3 KB
v6-0001-Change-how-a-base-backup-decides-which-files-have.patch application/octet-stream 11.4 KB
v6-0003-Prototype-patch-for-incremental-backup.patch application/octet-stream 333.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-10-23 20:08:01 Re: recovery modules
Previous Message Pavel Stehule 2023-10-23 18:49:21 Re: PostgreSQL domains and NOT NULL constraint