| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Changing the state of data checksums in a running cluster |
| Date: | 2026-04-02 09:27:51 |
| Message-ID: | d4265f6d-ae6a-4a52-999a-092a883d9f46@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
It'd be good to print a LOG message when the checksums state changes, to
have a trail in the log of when checksums were enabled/disabled.
Something like:
LOG: enabling checksums was requested, starting checksum calculation
...
LOG: checksums calculation finished, checksums are now enabled
On 02/04/2026 02:01, Daniel Gustafsson wrote:
> + if (result == DATACHECKSUMSWORKER_FAILED)
> + {
> + /*
> + * Disable checksums on cluster, because we failed one of the
> + * databases and this is an all or nothing process.
> + */
> + SetDataChecksumsOff();
> + ereport(ERROR,
> + errcode(ERRCODE_INSUFFICIENT_RESOURCES),
> + errmsg("data checksums failed to get enabled in all databases, aborting"),
> + errhint("The server log might have more information on the cause of the error."));
> + }
This got me thinking, what happens if the the data checksums launcher
encounters some other error, for example if you SIGTERM it? The system
is left in 'inprogress-on' state, but because the launcher is gone it
will never finish and 'pg_stat_progress_data_checksums' will be empty.
Perhaps launcher_exit() should call SetDataChecksumsOff()?
> /*
> * TODO: how to really handle the worker still running when the
> * launcher exits?
> */
> if (DataChecksumsWorkerShmem->worker_running)
> ereport(LOG,
> errmsg("data checksums launcher exiting while worker is still running"));
That TODO should be addressed somehow.
> + /*
> + * As of now we only update the block counter for main forks in order
> + * to not cause too frequent calls. TODO: investigate whether we
> + * should do it more frequent?
> + */
> + if (forkNum == MAIN_FORKNUM)
> + pgstat_progress_update_param(PROGRESS_DATACHECKSUMS_BLOCKS_DONE,
> + (blknum + 1));
We're updating it for every block in the main fork, but not at all for
other forks? What a bizarre way of avoiding too frequent calls :-). I
think you could just call this on every page,
pgstat_progress_update_param() is supposed to be very fast. For
comparison, e.g. index build calls it for every tuple.
> +/*
> + * Configuration of conditions which must match when absorbing a procsignal
> + * barrier during data checksum enable/disable operations. A single function
> + * is used for absorbing all barriers, and the set of conditions to use is
> + * looked up in the checksum_barriers struct. The struct member for the target
> + * state defines which state the backend must currently be in, and which it
> + * must not be in.
> + *
> + * The reason for this explicit checking is to ensure that processing cannot
> + * be started such that it breaks the assumptions of the state machine.
> + *
> + * MAX_BARRIER_CONDITIONS must match largest number of sets in barrier_eq and
> + * barrier_ne in the below checksum_barriers definition.
> + */
> +#define MAX_BARRIER_CONDITIONS 2
> +typedef struct ChecksumBarrierCondition
> +{
> + /* The target state of the barrier */
> + int target;
> + /* A set of states in which at least one MUST match the current state */
> + int barrier_eq[MAX_BARRIER_CONDITIONS];
> + /* The number of elements in the barrier_eq set */
> + int barrier_eq_sz;
> + /* A set of states which all MUST NOT match the current state */
> + int barrier_ne[MAX_BARRIER_CONDITIONS];
> + /* The number of elements in the barrier_ne set */
> + int barrier_ne_sz;
> +} ChecksumBarrierCondition;
> +
> +static const ChecksumBarrierCondition checksum_barriers[4] =
> +{
> + /*
> + * When disabling checksums, either inprogress state is Ok but checksums
> + * must not be in the enabled state.
> + */
> + {
> + .target = PG_DATA_CHECKSUM_OFF,
> + .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_INPROGRESS_OFF},
> + .barrier_eq_sz = 2,
> + .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
> + .barrier_ne_sz = 1
> + },
> + /* When enabling the current state must be inprogress-on */
> + {
> + .target = PG_DATA_CHECKSUM_VERSION,
> + .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON},
> + .barrier_eq_sz = 1,
> + {0}, 0
> + },
> +
> + /*
> + * When moving to inprogress-on the current state cannot enabled, but when
> + * moving to inprogress-off the current state must be enabled.
> + */
> + {
> + .target = PG_DATA_CHECKSUM_INPROGRESS_ON,
> + {0}, 0,
> + .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
> + .barrier_ne_sz = 1
> + },
> + {
> + .target = PG_DATA_CHECKSUM_INPROGRESS_OFF,
> + .barrier_eq = {PG_DATA_CHECKSUM_VERSION},
> + .barrier_eq_sz = 1,
> + {0}, 0
> + },
> +};
I find this to still be a pretty complicated and unclear way of
representing the allowed transitions. There are only 16 possible
transitions, and only 6 of them are allowed. How about listing the
allowed ones directly:
/* Allowed transitions: from, to */
{
/*
* Disabling checksums: If checksums are currently enabled,
* disabling must go through the 'inprogress-off' state.
*/
{PG_DATA_CHECKSUM_VERSION, PG_DATA_CHECKSUM_INPROGRESS_OFF},
{PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_OFF},
/*
* If checksums are in the process of being enabled, but are
* not yet being verified, we can abort by going back to 'off'
* state.
*/
{PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_OFF},
/*
* Enabling checksums must normally go through the 'inprogress-on'
* state.
*/
{PG_DATA_CHECKSUM_OFF, PG_DATA_CHECKSUM_INPROGRESS_ON},
{PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_VERSION},
/*
* If checksums are being disabled but all backends are still
* computing checksums, we can go straight back to 'on'
*/
{PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_VERSION},
}
> +/*
> + * Signaling between backends calling pg_enable/disable_data_checksums, the
> + * checksums launcher process, and the checksums worker process.
> + *
> + * This struct is protected by DataChecksumsWorkerLock
> + */
> +typedef struct DataChecksumsWorkerShmemStruct
> +{
> + /*
> + * These are set by pg_{enable|disable|verify}_data_checksums, to tell the
> + * launcher what the target state is.
> + */
> + DataChecksumsWorkerOperation launch_operation;
> + int launch_cost_delay;
> + int launch_cost_limit;
The naming feels a little weird with this struct. It's called
"DataChecksumsWorkerShmemStruct", but it's also accessed by the backends
calling pg_enable/disable_data_checksums(). And
"DataChecksumsWorkerOperation" is not accessed by workers at all. Or I
guess the "operation" global variable is used in
DataChecksumsWorkerMain(), but it's always set to ENABLE_DATACHECKSUMS
in the worker. Do you need the "operation" global variable at all?
> +void
> +SetDataChecksumsOn(void)
> +{
> + uint64 barrier;
> +
> + Assert(ControlFile != NULL);
> +
> + SpinLockAcquire(&XLogCtl->info_lck);
> +
> + /*
> + * The only allowed state transition to "on" is from "inprogress-on" since
> + * that state ensures that all pages will have data checksums written.
> + */
> + if (XLogCtl->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON)
> + {
> + SpinLockRelease(&XLogCtl->info_lck);
> + elog(PANIC, "checksums not in \"inprogress-on\" mode");
> + }
> +
> + SpinLockRelease(&XLogCtl->info_lck);
The PANIC seems a little harsh, you haven't done anything destructive
here. It's unexpected for this to be called in any other state, so this
is a "can't happen" scenario, but I don't think we usually PANIC on those.
> + <para>
> + If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
> + specified, the process is throttled using the same principles as
> + <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
> + </para>
Ugh, yet another place where we expose the "cost delay/limit" throttling
mechanism. I agree it's good to be consistent here and use the same
method we use for vacuum, I just wish we had something more user-friendly..
Grammar / spelling:
> + * state will also be set of "off".
> + * When moving to inprogress-on the current state cannot enabled, but when
> + * If a worker process currently running? This is set by the worker
> + * These are set by pg_{enable|disable|verify}_data_checksums, to tell the
there is no "pg_verify_data_checksums" function.
"calcuated" in commit message
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Imran Zaheer | 2026-04-02 09:49:51 | Silence -Wmaybe-uninitialized warnings |
| Previous Message | Masahiko Sawada | 2026-04-02 09:22:17 | Re: POC: Parallel processing of indexes in autovacuum |