| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Changing the state of data checksums in a running cluster |
| Date: | 2026-03-27 23:30:56 |
| Message-ID: | c92b5d8b-bc03-47bc-b209-2e4a719eee32@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 28/03/2026 00:03, Daniel Gustafsson wrote:
> The attached rebase contains lots more polish, mostly renaming variable names
> for clarity, tidying up comments and documentation and some smaller bits of
> cleanup like moving more code out of xlog.c.
>
> This version runs all the tests in a normal test-run, with a few of them pared
> down with larger runs gated by PG_TEST_EXTRA. I thinkt the tests are still too
> expensive in the event of getting committed, but it's helpful to have them
> during dev and test. Executing pgbench sometimes fails in CI but I've been
> unable to reproduce that so not entirely sure what is going on there.
>
> Heikki, Andres and Tomas; as you have been reviewing this patchset, what do you
> feel is left for considering this for commit? (Apart from figuring out the CI
> test thing mentioned above which I think is a buildsystem issue.) I think 0001
> could be considered independently of 0002 and is cleanup in it's own right.
+1 for committing 0001 right away.
Some leftover stuff remains, related to the change that it no longer
rebuilds the list of databases:
> + * The DataChecksumsWorker will compile a list of all databases at the start,
> + * and once all are processed will regenerate the list and start over
> + * processing any new entries. Once there are no new entries on the list,
> + * processing will end. The regenerated list is required since databases can
> + * be created concurrently with data checksum processing, using a template
> + * database which has yet to be processed. All databases MUST BE successfully
> + * processed in order for data checksums to be enabled, the only exception are
> + * databases which are dropped before having been processed.
and here:
> +/*
> + * Test to remove an entry from the Databaselist to force re-processing since
> + * not all databases could be processed in the first iteration of the loop.
> + */
> +void
> +dc_dblist(const char *name, const void *private_data, void *arg)
The above talks about re-processing, but that doesn't happen anymore. I
suspect the test that uses this is now obsolete or broken.
> + while (true)
> + {
> + int processed_databases = 0;
> +
> + foreach_ptr(DataChecksumsWorkerDatabase, db, DatabaseList)
> + {
> + DataChecksumsWorkerResult result;
> + DataChecksumsWorkerResultEntry *entry;
> + bool found;
> +
> + /*
> + * Check if this database has been processed already, and if so
> + * whether it should be retried or skipped.
> + */
> + entry = (DataChecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
> + HASH_FIND, NULL);
> +
I guess this works, but wouldn't it be simpler to remove entries from
DatabaseList when they're successfully processed?
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> @@ -1044,7 +1044,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
> if (pg_strcasecmp(strategy, "wal_log") == 0)
> dbstrategy = CREATEDB_WAL_LOG;
> else if (pg_strcasecmp(strategy, "file_copy") == 0)
> + {
> + if (DataChecksumsInProgress())
> + ereport(ERROR,
> + errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("create database strategy \"%s\" not allowed when data checksums are being enabled",
> + strategy));
> dbstrategy = CREATEDB_FILE_COPY;
> + }
> else
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
Is this enough? Is it possible that you start CREATE DATABASE in
file_copy mode, and while it's already running but hasn't copied
everything yet, you turn checksums on?
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-03-28 00:01:49 | Re: Don't synchronously wait for already-in-progress IO in read stream |
| Previous Message | Jacob Champion | 2026-03-27 23:20:53 | Re: unclear OAuth error message |