Re: trying again to get incremental backup

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-10-24 14:53:44
Message-ID: d84c0f06-a1e4-46cc-97d6-b0d87c15c268@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.10.23 22:08, Robert Haas wrote:
> - I would like some feedback on the generation of WAL summary files.
> Right now, I have it enabled by default, and summaries are kept for a
> week. That means that, with no additional setup, you can take an
> incremental backup as long as the reference backup was taken in the
> last week. File removal is governed by mtimes, so if you change the
> mtimes of your summary files or whack your system clock around, weird
> things might happen. But obviously this might be inconvenient. Some
> people might not want WAL summary files to be generated at all because
> they don't care about incremental backup, and other people might want
> them retained for longer, and still other people might want them to be
> not removed automatically or removed automatically based on some
> criteria other than mtime. I don't really know what's best here. I
> don't think the default policy that the patches implement is
> especially terrible, but it's just something that I made up and I
> don't have any real confidence that it's wonderful.

The easiest answer is to have it off by default. Let people figure out
what works for them. There are various factors like storage, network,
server performance, RTO that will determine what combination of full
backup, incremental backup, and WAL replay will satisfy someone's
requirements. I suppose tests could be set up to determine this to some
degree. But we could also start slow and let people figure it out
themselves. When pg_basebackup was added, it was also disabled by default.

If we think that 7d is a good setting, then I would suggest to consider,
like 10d. Otherwise, if you do a weekly incremental backup and you have
a time change or a hiccup of some kind one day, you lose your backup
sequence.

Another possible answer is, like, 400 days? Because why not? What is a
reasonable upper limit for this?

> - It's regrettable that we don't have incremental JSON parsing; I
> think that means anyone who has a backup manifest that is bigger than
> 1GB can't use this feature. However, that's also a problem for the
> existing backup manifest feature, and as far as I can see, we have no
> complaints about it. So maybe people just don't have databases with
> enough relations for that to be much of a live issue yet. I'm inclined
> to treat this as a non-blocker,

It looks like each file entry in the manifest takes about 150 bytes, so
1 GB would allow for 1024**3/150 = 7158278 files. That seems fine for now?

> - Right now, I have a hard-coded 60 second timeout for WAL
> summarization. If you try to take an incremental backup and the WAL
> summaries you need don't show up within 60 seconds, the backup times
> out. I think that's a reasonable default, but should it be
> configurable? If yes, should that be a GUC or, perhaps better, a
> pg_basebackup option?

The current user experience of pg_basebackup is that it waits possibly a
long time for a checkpoint, and there is an option to make it go faster,
but there is no timeout AFAICT. Is this substantially different? Could
we just let it wait forever?

Also, does waiting for checkpoint and WAL summarization happen in
parallel? If so, what if it starts a checkpoint that might take 15 min
to complete, and then after 60 seconds it kicks you off because the WAL
summarization isn't ready. That might be wasteful.

> - I'm curious what people think about the pg_walsummary tool that is
> included in 0006. I think it's going to be fairly important for
> debugging, but it does feel a little bit bad to add a new binary for
> something pretty niche.

This seems fine.

Is the WAL summary file format documented anywhere in your patch set
yet? My only thought was, maybe the file format could be human-readable
(more like backup_label) to avoid this. But maybe not.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-10-24 15:03:04 Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Previous Message Zhang Mingli 2023-10-24 14:46:06 Should Explain show Parallel Hash node’s total rows?