Re: trying again to get incremental backup

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-10-20 13:20:10
Message-ID: CAKZiRmybH_v9C4WcnzAYtJwjCvmFsuFcMq1Tfx2s+RB9hOUyNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

On Wed, Oct 4, 2023 at 10:09 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Oct 3, 2023 at 2:21 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Here's a new patch set, also addressing Jakub's observation that
> > MINIMUM_VERSION_FOR_WAL_SUMMARIES needed updating.
>
> Here's yet another new version.[..]

Okay, so another good news - related to the patch version #4.
Not-so-tiny stress test consisting of pgbench run for 24h straight
(with incremental backups every 2h, with base of initial full backup),
followed by two PITRs (one not using incremental backup and one using
to to illustrate the performance point - and potentially spot any
errors in between). In both cases it worked fine. Pgbench has this
behaviour that it doesn't cause space growth over time - it produces
lots of WAL instead. Some stats:

START DBSIZE: ~3.3GB (pgbench -i -s 200 --partitions=8)
END DBSIZE: ~4.3GB
RUN DURATION: 24h (pgbench -P 1 -R 100 -T 86400)
WALARCHIVES-24h: 77GB
FULL-DB-BACKUP-SIZE: 3.4GB
INCREMENTAL-BACKUP-11-SIZE: 3.5GB
Env: Azure VM D4s (4VCPU), Debian 11, gcc 10.2, normal build (asserts
and debug disabled)
The increments were taken every 2h just to see if they would fail for
any reason - they did not.

PITR RTO RESULTS (copy/pg_combinebackup time + recovery time):
1. time to restore from fullbackup (+ recovery of 24h WAL[77GB]): 53s
+ 4640s =~ 78min
2. time to restore from fullbackup+incremental backup from 2h ago (+
recovery of 2h WAL [5.4GB]): 68s + 190s =~ 4min18s

I could probably pre populate the DB with 1TB cold data (not touched
to be touched pgbench at all), just for the sake of argument, and that
would probably could be demonstrated how space efficient the
incremental backup can be, but most of time would be time wasted on
copying the 1TB here...

> - I would like some feedback on the generation of WAL summary files.
> Right now, I have it enabled by default, and summaries are kept for a
> week. That means that, with no additional setup, you can take an
> incremental backup as long as the reference backup was taken in the
> last week.

I've just noticed one thing when recovery is progress: is
summarization working during recovery - in the background - an
expected behaviour? I'm wondering about that, because after freshly
restored and recovered DB, one would need to create a *new* full
backup and only from that point new summaries would have any use?
Sample log:

2023-10-20 11:10:02.288 UTC [64434] LOG: restored log file
"000000010000000200000022" from archive
2023-10-20 11:10:02.599 UTC [64434] LOG: restored log file
"000000010000000200000023" from archive
2023-10-20 11:10:02.769 UTC [64446] LOG: summarized WAL on TLI 1 from
2/139B1130 to 2/239B1518
2023-10-20 11:10:02.923 UTC [64434] LOG: restored log file
"000000010000000200000024" from archive
2023-10-20 11:10:03.193 UTC [64434] LOG: restored log file
"000000010000000200000025" from archive
2023-10-20 11:10:03.345 UTC [64432] LOG: restartpoint starting: wal
2023-10-20 11:10:03.407 UTC [64446] LOG: summarized WAL on TLI 1 from
2/239B1518 to 2/25B609D0
2023-10-20 11:10:03.521 UTC [64434] LOG: restored log file
"000000010000000200000026" from archive
2023-10-20 11:10:04.429 UTC [64434] LOG: restored log file
"000000010000000200000027" from archive

>
> - On a related note, I haven't yet tested this on a standby, which is
> a thing that I definitely need to do. I don't know of a reason why it
> shouldn't be possible for all of this machinery to work on a standby
> just as it does on a primary, but then we need the WAL summarizer to
> run there too, which could end up being a waste if nobody ever tries
> to take an incremental backup. I wonder how that should be reflected
> in the configuration. We could do something like what we've done for
> archive_mode, where on means "only on if this is a primary" and you
> have to say always if you want it to run on standbys as well ... but
> I'm not sure if that's a design pattern that we really want to
> replicate into more places. I'd be somewhat inclined to just make
> whatever configuration parameters we need to configure this thing on
> the primary also work on standbys, and you can set each server up as
> you please. But I'm open to other suggestions.

I'll try to play with some standby restores in future, stay tuned.

Regards,
-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Wheeler 2023-10-20 13:49:35 Re: Patch: Improve Boolean Predicate JSON Path Docs
Previous Message jian he 2023-10-20 12:45:05 Re: SQL:2011 application time