Re: Changing the state of data checksums in a running cluster

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Changing the state of data checksums in a running cluster
Date: 2026-03-27 23:30:56
Message-ID: c92b5d8b-bc03-47bc-b209-2e4a719eee32@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28/03/2026 00:03, Daniel Gustafsson wrote:
> The attached rebase contains lots more polish, mostly renaming variable names
> for clarity, tidying up comments and documentation and some smaller bits of
> cleanup like moving more code out of xlog.c.
>
> This version runs all the tests in a normal test-run, with a few of them pared
> down with larger runs gated by PG_TEST_EXTRA. I thinkt the tests are still too
> expensive in the event of getting committed, but it's helpful to have them
> during dev and test. Executing pgbench sometimes fails in CI but I've been
> unable to reproduce that so not entirely sure what is going on there.
>
> Heikki, Andres and Tomas; as you have been reviewing this patchset, what do you
> feel is left for considering this for commit? (Apart from figuring out the CI
> test thing mentioned above which I think is a buildsystem issue.) I think 0001
> could be considered independently of 0002 and is cleanup in it's own right.

+1 for committing 0001 right away.

Some leftover stuff remains, related to the change that it no longer
rebuilds the list of databases:

> + * The DataChecksumsWorker will compile a list of all databases at the start,
> + * and once all are processed will regenerate the list and start over
> + * processing any new entries. Once there are no new entries on the list,
> + * processing will end. The regenerated list is required since databases can
> + * be created concurrently with data checksum processing, using a template
> + * database which has yet to be processed. All databases MUST BE successfully
> + * processed in order for data checksums to be enabled, the only exception are
> + * databases which are dropped before having been processed.

and here:

> +/*
> + * Test to remove an entry from the Databaselist to force re-processing since
> + * not all databases could be processed in the first iteration of the loop.
> + */
> +void
> +dc_dblist(const char *name, const void *private_data, void *arg)

The above talks about re-processing, but that doesn't happen anymore. I
suspect the test that uses this is now obsolete or broken.

> + while (true)
> + {
> + int processed_databases = 0;
> +
> + foreach_ptr(DataChecksumsWorkerDatabase, db, DatabaseList)
> + {
> + DataChecksumsWorkerResult result;
> + DataChecksumsWorkerResultEntry *entry;
> + bool found;
> +
> + /*
> + * Check if this database has been processed already, and if so
> + * whether it should be retried or skipped.
> + */
> + entry = (DataChecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
> + HASH_FIND, NULL);
> +

I guess this works, but wouldn't it be simpler to remove entries from
DatabaseList when they're successfully processed?

> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> @@ -1044,7 +1044,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
> if (pg_strcasecmp(strategy, "wal_log") == 0)
> dbstrategy = CREATEDB_WAL_LOG;
> else if (pg_strcasecmp(strategy, "file_copy") == 0)
> + {
> + if (DataChecksumsInProgress())
> + ereport(ERROR,
> + errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("create database strategy \"%s\" not allowed when data checksums are being enabled",
> + strategy));
> dbstrategy = CREATEDB_FILE_COPY;
> + }
> else
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),

Is this enough? Is it possible that you start CREATE DATABASE in
file_copy mode, and while it's already running but hasn't copied
everything yet, you turn checksums on?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-03-28 00:01:49 Re: Don't synchronously wait for already-in-progress IO in read stream
Previous Message Jacob Champion 2026-03-27 23:20:53 Re: unclear OAuth error message