Re: trying again to get incremental backup

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-11-13 16:25:21
Message-ID: 202311131625.o7hzq3oukuyd@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Great stuff you got here. I'm doing a first pass trying to grok the
whole thing for more substantive comments, but in the meantime here are
some cosmetic ones.

I got the following warnings, both valid:

../../../../pgsql/source/master/src/common/blkreftable.c: In function 'WriteBlockRefTable':
../../../../pgsql/source/master/src/common/blkreftable.c:520:45: warning: declaration of 'brtentry' shadows a previous local [-Wshadow=compatible-local]
520 | BlockRefTableEntry *brtentry;
| ^~~~~~~~
../../../../pgsql/source/master/src/common/blkreftable.c:492:37: note: shadowed declaration is here
492 | BlockRefTableEntry *brtentry;
| ^~~~~~~~

../../../../../pgsql/source/master/src/backend/postmaster/walsummarizer.c: In function 'SummarizeWAL':
../../../../../pgsql/source/master/src/backend/postmaster/walsummarizer.c:833:57: warning: declaration of 'private_data' shadows a previous local [-Wshadow=compatible-local]
833 | SummarizerReadLocalXLogPrivate *private_data;
| ^~~~~~~~~~~~
../../../../../pgsql/source/master/src/backend/postmaster/walsummarizer.c:709:41: note: shadowed declaration is here
709 | SummarizerReadLocalXLogPrivate *private_data;
| ^~~~~~~~~~~~

In blkreftable.c, I think the definition of SH_EQUAL should have an
outer layer of parentheses. Also, it would be good to provide and use a
function to initialize a BlockRefTableKey from the RelFileNode and
forknum components, and ensure that any padding bytes are zeroed.
Otherwise it's not going to be a great hash key. On my platform there
aren't any (padding bytes), but I think it's unwise to rely on that.

I don't think SummarizerReadLocalXLogPrivate->waited is used for
anything. Could be removed AFAICS, unless you're foreseen adding
something that uses it.

These forward struct declarations are not buying you anything, I'd
remove them:

diff --git a/src/include/common/blkreftable.h b/src/include/common/blkreftable.h
index 70d6c072d7..316e67122c 100644
--- a/src/include/common/blkreftable.h
+++ b/src/include/common/blkreftable.h
@@ -29,10 +29,7 @@
/* Magic number for serialization file format. */
#define BLOCKREFTABLE_MAGIC 0x652b137b

-struct BlockRefTable;
-struct BlockRefTableEntry;
-struct BlockRefTableReader;
-struct BlockRefTableWriter;
+/* Struct definitions appear in blkreftable.c */
typedef struct BlockRefTable BlockRefTable;
typedef struct BlockRefTableEntry BlockRefTableEntry;
typedef struct BlockRefTableReader BlockRefTableReader;

and backup_label.h doesn't know about TimeLineID, so it needs this:

diff --git a/src/bin/pg_combinebackup/backup_label.h b/src/bin/pg_combinebackup/backup_label.h
index 08d6ed67a9..3af7ea274c 100644
--- a/src/bin/pg_combinebackup/backup_label.h
+++ b/src/bin/pg_combinebackup/backup_label.h
@@ -12,6 +12,7 @@
#ifndef BACKUP_LABEL_H
#define BACKUP_LABEL_H

+#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "lib/stringinfo.h"

I don't much like the way header files in src/bin/pg_combinebackup files
are structured. Particularly, causing a simplehash to be "instantiated"
just because load_manifest.h is included seems poised to cause pain. I
think there should be a file with the basic struct declarations (no
simplehash); and then maybe since both pg_basebackup and
pg_combinebackup seem to need the same simplehash, create a separate
header file containing just that.. But, did you notice that anything
that includes reconstruct.h will instantiate the simplehash stuff,
because it includes load_manifest.h? It may be unwise to have the
simplehash in a header file. Maybe just declare it in each .c file that
needs it. The duplicity is not that large.

I'll see if I can understand the way all these headers are needed to
propose some other arrangement.

I see this idea of having "struct FooBar;" immediately followed by
"typedef struct FooBar FooBar;" which I mentioned from blkreftable.h
occurs in other places as well (JsonManifestParseContext in
parse_manifest.h, maybe others?). Was this pattern cargo-culted from
somewhere? Perhaps we have other places to clean up.

Why leave unnamed arguments in function declarations? For example, in

static void manifest_process_file(JsonManifestParseContext *,
char *pathname,
size_t size,
pg_checksum_type checksum_type,
int checksum_length,
uint8 *checksum_payload);
the first argument lacks a name. Is this just an oversight, I hope?

In GetFileBackupMethod(), which arguments are in and which are out?
The comment doesn't say, and it's not obvious why we pass both the file
path as well as the individual constituent pieces for it.

DO_NOT_BACKUP_FILE appears not to be set anywhere. Do you expect to use
this later? If not, maybe remove it.

There are two functions named record_manifest_details_for_file() in
different programs. I think this sort of arrangement is not great, as
it is confusing confusing to follow. It would be better if those two
routines were called something like, say, verifybackup_perfile_cb and
combinebackup_perfile_cb instead; then in the function comment say
something like
/*
* JsonManifestParseContext->perfile_cb implementation for pg_combinebackup.
*
* Record details extracted from the backup manifest for one file,
* because we like to keep things tracked or whatever.
*/
so it's easy to track down what does what and why. Same with
perwalrange_cb. "perfile" looks bothersome to me as a name entity. Why
not per_file_cb? and per_walrange_cb?

In walsummarizer.c, HandleWalSummarizerInterrupts is called in
summarizer_read_local_xlog_page but SummarizeWAL() doesn't do that.
Maybe it should?

I think this path is not going to be very human-likeable.
snprintf(final_path, MAXPGPATH,
XLOGDIR "/summaries/%08X%08X%08X%08X%08X.summary",
tli,
LSN_FORMAT_ARGS(summary_start_lsn),
LSN_FORMAT_ARGS(summary_end_lsn));
Why not add a dash between the TLI and between both LSNs, or something
like that? (Also, are we really printing TLIs as 8-byte hexs?)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-11-13 16:49:59 Re: Why do indexes and sorts use the database collation?
Previous Message Pavel Stehule 2023-11-13 16:07:29 Re: proposal: possibility to read dumped table's name from file