Re: Online checksums patch - once again

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Online checksums patch - once again
Date: 2021-01-05 20:29:31
Message-ID: 5ff4cc2c.1c69fb81.6e306.046b@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jan 05, 2021 at 12:18:07AM +0100, Daniel Gustafsson wrote:
> Attached is a rebase of the patch on top of current HEAD.
>
> cheers ./daniel

Some more comments/questions:

> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 53e997cd55..06c001f8ff 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -7284,7 +7284,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
> * and dirtied.
> *
> * If checksums are enabled, we also generate a full-page image of
> - * heap_buffer, if necessary.
> + * heap_buffer.

That sounds like it has nothing to do with online (de)activation of
checksums?

> */
> XLogRecPtr
> log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
> @@ -7305,11 +7305,13 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
> XLogRegisterBuffer(0, vm_buffer, 0);
>
> flags = REGBUF_STANDARD;
> + HOLD_INTERRUPTS();
> if (!XLogHintBitIsNeeded())
> flags |= REGBUF_NO_IMAGE;
> XLogRegisterBuffer(1, heap_buffer, flags);
>
> recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
> + RESUME_INTERRUPTS();

This could maybe do with a comment on why the HOLD/RESUME_INTERRUPTS()
is required here, similar as is done in bufpage.c further down.

> diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
> index 92cc7ea073..fa074c6046 100644
> --- a/src/backend/access/rmgrdesc/xlogdesc.c
> +++ b/src/backend/access/rmgrdesc/xlogdesc.c
> @@ -18,6 +18,7 @@
> #include "access/xlog.h"
> #include "access/xlog_internal.h"
> #include "catalog/pg_control.h"
> +#include "storage/bufpage.h"
> #include "utils/guc.h"
> #include "utils/timestamp.h"
>
> @@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
> xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
> timestamptz_to_str(xlrec.end_time));
> }
> + else if (info == XLOG_CHECKSUMS)
> + {
> + xl_checksum_state xlrec;
> +
> + memcpy(&xlrec, rec, sizeof(xl_checksum_state));
> + if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
> + appendStringInfo(buf, "on");
> + else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
> + appendStringInfo(buf, "inprogress-off");
> + else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
> + appendStringInfo(buf, "inprogress-on");
> + else
> + appendStringInfo(buf, "off");

We probably discussed this earlier, but what was the conclusion?
PG_DATA_CHECKSUM_VERSION = 1 sounds like somebody thought it a good idea
to not just have a bool here but they probably rather thought about
different checkumming/hashing algorithms/implentations than about
internal state of the checksumming machinery.

If we decide on v2 of data page checksums, how would that look like?

PG_DATA_CHECKSUM_VERSION_V2 = 4?

If we think we're done with versions, did we consider removing the
VERSION here, because it is really confusing in
PG_DATA_CHECKSUM_INPROGRESS_ON/OFF_VERSION, like
PG_DATA_CHECKSUM_STATE_ON/INPROGRESS_ON/OFF? Or would removing "version"
also from pg_controldata be a backwards-incompatible change we don't
want to do?

Sorry again, I think we discussed it earlier, but maybe at least some
comments about what VERSION is supposed to be/mean in bufpage.h would be
in order:

> --- a/src/include/storage/bufpage.h
> +++ b/src/include/storage/bufpage.h
> @@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
> */
> #define PG_PAGE_LAYOUT_VERSION 4
> #define PG_DATA_CHECKSUM_VERSION 1
> +#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION 2
> +#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION 3
> +

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index ede93ad7fd..01eca900ac 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -1089,7 +1102,7 @@ XLogInsertRecord(XLogRecData *rdata,
> Assert(RedoRecPtr < Insert->RedoRecPtr);
> RedoRecPtr = Insert->RedoRecPtr;
> }
> - doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
> + doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());

Indent, but I guess this will be indented by pg_indent after the fact
andyway? Or is this how it's supposed to look?

> @@ -4938,13 +4949,299 @@ GetMockAuthenticationNonce(void)
> }
>
> /*
> - * Are checksums enabled for data pages?
> + * DataChecksumsNeedWrite
> + * Returns whether data checksums must be written or not
> + *
> + * Are checksums enabled, or in the process of being enabled, for data pages?

The second "," looks odd and could be ommitted?.

> + * DataChecksumsNeedVerify
> + * Returns whether data checksums must be verified or not
> + *
> + * Data checksums are only verified if they are fully enabled in the cluster.
> + * During the "inprogress-on" and "inprogress-off" states they are only
> + * updated, not verified.
> + *
> + * This function is intended for callsites which have read data and are about
> + * to perform checksum validation based on the result of this. To avoid the
> + * the risk of the checksum state changing between reading and performing the
> + * validation (or not), interrupts must be held off. This implies that calling
> + * this function must be performed as close to the validation call as possible
> + * to keep the critical section short. This is in order to protect against
> + * TOCTOU situations around checksum validation.

I had to google "TOCTOU" and this acronym doesn't appear elsewhere in
the source tree, so I suggest to spell it out at least here (there's one
more occurance of it in this patch)

> + * DataChecksumsOnInProgress
> + * Returns whether data checksums are being enabled
> + *
> + * Most operations don't need to worry about the "inprogress" states, and
> + * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
> + * "inprogress" state for enabling checksums is used when the checksum worker
> + * is setting checksums on all pages, it can thus be used to check for aborted
> + * checksum processing which need to be restarted.

The first "inprogress" looks legit as it talks about both states, but
the second one should be "inprogress-on" I think and ...

> + * DataChecksumsOffInProgress
> + * Returns whether data checksums are being disabled
> + *
> + * The "inprogress" state for disabling checksums is used for when the worker
> + * resets the catalog state. Operations should use DataChecksumsNeedVerify()
> + * or DataChecksumsNeedWrite() for deciding whether to read/write checksums.

... "inprogress-off" here.

> +void
> +AbsorbChecksumsOnInProgressBarrier(void)
> +{
> + Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
> + LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
> +}

Bikeshed alert: maybe alo those Absorb*Barrier functions could be lumped
together after the SetDataChecksums*() functions. If not, a function
comment would be in order.

> +void
> +SetDataChecksumsOnInProgress(void)
> +{
> + uint64 barrier;
> +
> + Assert(ControlFile != NULL);
> +
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> +
> + if (ControlFile->data_checksum_version != 0)
> + {
> + LWLockRelease(ControlFileLock);
> + return;
> + }
> + LWLockRelease(ControlFileLock);
> +
> + MyProc->delayChkpt = true;
> + START_CRIT_SECTION();
> +
> + XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
> +
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> + ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
> + UpdateControlFile();
> + LWLockRelease(ControlFileLock);
> +
> + barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
> +
> + END_CRIT_SECTION();
> + MyProc->delayChkpt = false;
> +
> + WaitForProcSignalBarrier(barrier);
> +}

This function doesn't have the customary function header comment nor any
comments in the body and it looks like it's doing some pretty important
stuff, so I think some comments would be in order, e.g.
explaining that "data_checksum_version != 0" means we've already got
checksums enabled or are in the process of enabling/disabling them.

The corresponding SetDataChecksumsOff() function has comments that could
be duplicated here. If anything, it could be moved below
SetDataChecksumsOff() so that the reader potentially already went
through the comments in the other similar functions.

> +/*
> + * SetDataChecksumsOn
> + * Enables data checksums cluster-wide
> + *
> + * Enabling data checksums is performed using two barriers, the first one
> + * sets the checksums state to "inprogress-on" and the second one to "on".
> + * During "inprogress-on", checksums are written but not verified. When all
> + * existing pages are guaranteed to have checksums, and all new pages will be
> + * initiated with checksums, the state can be changed to "on".

This should proabably go above SetDataChecksumsOnInProgress() because
even though I've just reviewed this, I looked at the following function
body and wondered where the second barrier went...

> + */
> +void
> +SetDataChecksumsOn(void)
> {
> + uint64 barrier;
> +
> Assert(ControlFile != NULL);
> - return (ControlFile->data_checksum_version > 0);
> +
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> +
> + if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
> + {
> + LWLockRelease(ControlFileLock);
> + elog(ERROR, "checksums not in \"inprogress-on\" mode");
> + }
> +
> + LWLockRelease(ControlFileLock);
> +
> + MyProc->delayChkpt = true;
> + START_CRIT_SECTION();
> +
> + XlogChecksums(PG_DATA_CHECKSUM_VERSION);
> +
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> + ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
> + UpdateControlFile();
> + LWLockRelease(ControlFileLock);
> +
> + barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
> +
> + END_CRIT_SECTION();
> + MyProc->delayChkpt = false;
> +
> + WaitForProcSignalBarrier(barrier);
> +}
> +
> + * SetDataChecksumsOff
> + * Disables data checksums cluster-wide
> + *
> + * Disabling data checksums must be performed with two sets of barriers, each
> + * carrying a different state. The state is first set to "inprogress-off"
> + * during which checksums are still written but not verified. This ensures that
> + * backends which have yet to observe the state change from "on" won't get
> + * validation errors on concurrently modified pages. Once all backends have
> + * changed to "inprogress-off", the barrier for moving to "off" can be
> + * emitted.
> + */
> +void
> +SetDataChecksumsOff(void)
> +{
> + uint64 barrier;
> +
> + Assert(ControlFile);
> +
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> +
> + /* If data checksums are already disabled there is nothing to do */
> + if (ControlFile->data_checksum_version == 0)
> + {
> + LWLockRelease(ControlFileLock);
> + return;
> + }
> +
> + /*
> + * If data checksums are currently enabled we first transition to the
> + * inprogress-off state during which backends continue to write checksums

The inprogress-off is in " " everywhere else.

> + * without verifying them. When all backends are in "inprogress-off" the
> + * next transition to "off" can be performed, after which all data checksum
> + * processing is disabled.
> + */
> + if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
> + {
> + LWLockRelease(ControlFileLock);
> +
> + MyProc->delayChkpt = true;
> + START_CRIT_SECTION();
> +
> + XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
> +
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> + ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
> + UpdateControlFile();
> + LWLockRelease(ControlFileLock);
> +
> + barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
> +
> + END_CRIT_SECTION();
> + MyProc->delayChkpt = false;
> +
> + /*
> + * Update local state in all backends to ensure that any backend in
> + * "on" state is changed to "inprogress-off".
> + */
> + WaitForProcSignalBarrier(barrier);
> +
> + /*
> + * At this point we know that no backends are verifying data checksums
> + * during reading. Next, we can safely move to state "off" to also
> + * stop writing checksums.
> + */
> + }
> + else
> + {
> + /*
> + * Ending up here implies that the checksums state is "inprogress-on"
> + * and we can transition directly to "off" from there.

Can you explain that a bit more? Is "inprogress-on" a typo for
"inprogress-off", or do you really mean that we can just switch off
checksums during "inprogress-on"? If so, the rationale should be
explained a bit more.

> @@ -7929,6 +8226,32 @@ StartupXLOG(void)
> */
> CompleteCommitTsInitialization();
>
> + /*
> + * If we reach this point with checksums in progress state (either being
> + * enabled or being disabled), we notify the user that they need to
> + * manually restart the process to enable checksums. This is because we

I think this could rephrased to "If we reach this point with checksums
being enabled, we notify..." because the disable case is different and
handled in the following block.

> + * cannot launch a dynamic background worker directly from here, it has to
> + * be launched from a regular backend.
> + */
> + if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
> + ereport(WARNING,
> + (errmsg("data checksums are being enabled, but no worker is running"),
> + errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
> +
> + /*
> + * If data checksums were being disabled when the cluster was shutdown, we
> + * know that we have a state where all backends have stopped validating
> + * checksums and we can move to off instead.
> + */
> + if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
> + {
> + XlogChecksums(0);
> +
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> + ControlFile->data_checksum_version = 0;
> + LWLockRelease(ControlFileLock);
> + }
> +
> /*
> * All done with end-of-recovery actions.
> *

> index 21f2240ade..a5e715f19c 100644
> --- a/src/backend/catalog/heap.c
> +++ b/src/backend/catalog/heap.c
> @@ -965,10 +965,13 @@ InsertPgClassTuple(Relation pg_class_desc,
> /* relpartbound is set by updating this tuple, if necessary */
> nulls[Anum_pg_class_relpartbound - 1] = true;
>
> + HOLD_INTERRUPTS();
> + values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
> tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
>
> /* finally insert the new tuple, update the indexes, and clean up */
> CatalogTupleInsert(pg_class_desc, tup);
> + RESUME_INTERRUPTS();
>
> heap_freetuple(tup);

Maybe add a comment here why we now HOLD/RESUME_INTERRUPTS.

> diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
> new file mode 100644
> index 0000000000..5d94db95f9
> --- /dev/null
> +++ b/src/backend/postmaster/datachecksumsworker.c
> @@ -0,0 +1,1527 @@
> +/*-------------------------------------------------------------------------
> + *
> + * datachecksumsworker.c
> + * Background worker for enabling or disabling data checksums online
> + *
> + * When enabling data checksums on a database at initdb time or with
> + * pg_checksums, no extra process is required as each page is checksummed, and
> + * verified, when accessed. When enabling checksums on an already running
> + * cluster, which does not run with checksums enabled, this worker will ensure
> + * that all pages are checksummed before verification of the checksums is
> + * turned on. In the case of disabling checksums, the state transition is
> + * recorded in the catalog and control file, and no changes are performed
> + * on the data pages or in the catalog.
> + *
> + * Checksums can be either enabled or disabled cluster-wide, with on/off being
> + * the end state for data_checksums.
> + *
> + * Enabling checksums
> + * ------------------
> + * When enabling checksums in an online cluster, data_checksums will be set to
> + * "inprogress-on" which signals that write operations MUST compute and write
> + * the checksum on the data page, but during reading the checksum SHALL NOT be
> + * verified. This ensures that all objects created during checksumming will
> + * have checksums set, but no reads will fail due to incorrect checksum. The
> + * DataChecksumsWorker will compile a list of databases which exist at the
> + * start of checksumming, and all of these which haven't been dropped during
> + * the processing MUST have been processed successfully in order for checksums
> + * to be enabled. Any new relation created during processing will see the
> + * in-progress state and will automatically be checksummed as well as have its
> + * state recorded in the catalog to avoid the datachecksumsworker having to
> + * process it when already checksummed.
> + *
> + * For each database, all relations which have storage are read and every data
> + * page is marked dirty to force a write with the checksum. This will generate
> + * a lot of WAL as the entire database is read and written. Once all datapages

It's "data page" with a space in between everywhere else.

> + * Synchronization and Correctness
> + * -------------------------------
> + * The processes involved in enabling, or disabling, data checksums in an
> + * online cluster must be properly synchronized with the normal backends
> + * serving concurrent queries to ensure correctness. Correctness is defined
> + * as the following:
> + *
> + * - Backends SHALL NOT violate local datachecksum state
> + * - Data checksums SHALL NOT be considered enabled cluster-wide until all

Linewrap.

> + * currently connected backends have the local state "enabled"

> + * Synchronizing the state change is done with procsignal barriers, where the
> + * backend updating the global state in the controlfile will wait for all other
> + * backends to absorb the barrier before WAL logging. Barrier absorption will
> + * happen during interrupt processing, which means that connected backends will
> + * change state at different times.
> + *
> + * When Enabling Data Checksums
> + * ----------------------------

There's something off with the indentation of either the title or the
line seperator here.

> + * A process which fails to observe data checksums being enabled can induce
> + * two types of errors: failing to write the checksum when modifying the page
> + * and failing to validate the data checksum on the page when reading it.
> + *
> + * When the DataChecksumsWorker has finished writing checksums on all pages
> + * and enable data checksums cluster-wide, there are three sets of backends:

"enables"

> + * Bg: Backend updating the global state and emitting the procsignalbarrier
> + * Bd: Backends on "off" state

s/on/in/

Also, given that "When the DataChecksumsWorker has finished writing
checksums on all pages and enable[s] data checksums cluster-wide",
shouldn't that mean that all other backends are either in "on" or
"inprogress-on" state, because the Bd -> Bi transition happened during a
previous barrier? Maybe that should be first explained?

> + * Be: Backends in "on" state
> + * Bi: Backends in "inprogress-on" state
> + *
> + * Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
> + *
> + * Backends in Bi and Be will write checksums when modifying a page, but only
> + * backends in Be will verify the checksum during reading. The Bg backend is
> + * blocked waiting for all backends in Bi to process interrupts and move to
> + * Be. Any backend starting will observe the global state being "on" and will

"Any backend starting while Bg is waiting for the barrier" right?

> + * All sets are compatible while still operating based on
> + * their local state.

Whoa, you lost me there.

> + * When Disabling Data Checksums
> + * -----------------------------
> + * A process which fails to observe data checksums being disabled can induce
> + * two types of errors: writing the checksum when modifying the page and

Can you rephrase what you mean with "being disabled"? If you mean we're
in the "inprogress-off" state, then why is that an error? Do you mean
"writing *no* checksum" because AIUI we should still write checksums at
this point? Or are you talking about a different state?

> + * validating a data checksum which is no longer correct due to modifications
> + * to the page.
> + *
> + * Bg: Backend updating the global state and emitting the procsignalbarrier
> + * Bd: Backands in "off" state

s/Backands/Backends/

> + * Be: Backends in "on" state
> + * Bi: Backends in "inprogress-off" state

I suggest using a different symbol here for "inprogress-off" in order
not to confuse the two (different right?) Bi.

> + * Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
> + *
> + * The goal is to transition all backends to Bd making the others empty sets.
> + * Backends in Bi writes data checksums, but don't validate them, such that

s/writes/write/

> + * backends still in Be can continue to validate pages until the barrier has
> + * been absorbed such that they are in Bi. Once all backends are in Bi, the
> + * barrier to transition to "off" can be raised and all backends can safely
> + * stop writing data checksums as no backend is enforcing data checksum
> + * validation.

... "anymore" maybe.

> + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

Copyright year bump, there's two other occurances.

> + * IDENTIFICATION
> + * src/backend/postmaster/datachecksumsworker.c
> + *
> + *-------------------------------------------------------------------------
> + */
> +#include "postgres.h"
> +
> +#include "access/genam.h"
> +#include "access/heapam.h"
> +#include "access/htup_details.h"
> +#include "access/xact.h"
> +#include "catalog/indexing.h"
> +#include "catalog/pg_class.h"
> +#include "catalog/pg_database.h"
> +#include "commands/vacuum.h"
> +#include "common/relpath.h"
> +#include "miscadmin.h"
> +#include "pgstat.h"
> +#include "postmaster/bgworker.h"
> +#include "postmaster/bgwriter.h"
> +#include "postmaster/datachecksumsworker.h"
> +#include "storage/bufmgr.h"
> +#include "storage/checksum.h"
> +#include "storage/lmgr.h"
> +#include "storage/ipc.h"
> +#include "storage/procarray.h"
> +#include "storage/smgr.h"
> +#include "tcop/tcopprot.h"
> +#include "utils/fmgroids.h"
> +#include "utils/lsyscache.h"
> +#include "utils/ps_status.h"
> +#include "utils/syscache.h"
> +
> +#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
> +
> +#define MAX_OPS 4
> +
> +typedef enum DataChecksumOperation
> +{
> + ENABLE_CHECKSUMS = 1,
> + DISABLE_CHECKSUMS,
> + RESET_STATE,
> + SET_INPROGRESS_ON,
> + SET_CHECKSUMS_ON
> +} DataChecksumOperation;
> +
> +typedef enum
> +{
> + DATACHECKSUMSWORKER_SUCCESSFUL = 0,
> + DATACHECKSUMSWORKER_ABORTED,
> + DATACHECKSUMSWORKER_FAILED,
> + DATACHECKSUMSWORKER_RETRYDB,
> +} DatachecksumsWorkerResult;
> +
> +typedef struct DatachecksumsWorkerShmemStruct
> +{
> + /*
> + * Access to launcher_started and abort must be protected by
> + * DatachecksumsWorkerLock.
> + */
> + bool launcher_started;
> + bool abort;
> +
> + /*
> + * Variables for the worker to signal the launcher, or subsequent workers
> + * in other databases. As there is only a single worker, and the launcher
> + * won't read these until the worker exits, they can be accessed without
> + * the need for a lock. If multiple workers are supported then this will
> + * have to be revisited.
> + */
> + DatachecksumsWorkerResult success;
> + bool process_shared_catalogs;
> +
> + /*
> + * The below members are set when the launcher starts, and are only
> + * accessed read-only by the single worker. Thus, we can access these
> + * without a lock. If multiple workers, or dynamic cost parameters, are
> + * supported at some point then this would need to be revisited.
> + */
> + int cost_delay;
> + int cost_limit;
> + int operations[MAX_OPS];
> + bool target;

This "target" bool isn't documented very well (at least here). AIUI,
it's true if we enable checksums and false if we disable them?

> +} DatachecksumsWorkerShmemStruct;
> +
> +/* Shared memory segment for datachecksumsworker */
> +static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
> +
> +/* Bookkeeping for work to do */
> +typedef struct DatachecksumsWorkerDatabase
> +{
> + Oid dboid;
> + char *dbname;
> +} DatachecksumsWorkerDatabase;
> +
> +typedef struct DatachecksumsWorkerResultEntry
> +{
> + Oid dboid;
> + DatachecksumsWorkerResult result;
> + int retries;
> +} DatachecksumsWorkerResultEntry;
> +
> +
> +/* Prototypes */
> +static List *BuildDatabaseList(void);
> +static List *BuildRelationList(bool temp_relations, bool include_shared);
> +static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
> +static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
> +static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
> +static void launcher_cancel_handler(SIGNAL_ARGS);
> +static void SetRelHasChecksums(Oid relOid);
> +static void WaitForAllTransactionsToFinish(void);
> +
> +/*
> + * DataChecksumsWorkerStarted
> + * Informational function to query the state of the worker
> + */
> +bool
> +DataChecksumsWorkerStarted(void)
> +{
> + bool started;
> +
> + LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
> + started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
> + LWLockRelease(DatachecksumsWorkerLock);
> +
> + return started;
> +}
> +
> +
> +/*
> + * StartDataChecksumsWorkerLauncher
> + * Main entry point for datachecksumsworker launcher process
> + *
> + * The main entrypoint for starting data checksums processing for enabling as
> + * well as disabling.
> + */
> +void
> +StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)

After reading through the function I find this 'bool enable_checksums'
a bit confusing, I think something like 'int operation' and then
comparing it to either ENABLE or DISABLE or whatever would make the code
more readable, but it's a minor nitpick.

> +{
> + BackgroundWorker bgw;
> + BackgroundWorkerHandle *bgw_handle;
> +
> + /*
> + * Given that any backend can initiate a data checksum operation, the
> + * launcher can at this point be in one of the below distinct states:
> + *
> + * A: Started and performing an operation; B: Started and in the
> + * process of aborting; C: Not started
> + *
> + * If the launcher is in state A, and the requested target state is equal
> + * to the currently performed operation then we can return immediately.
> + * This can happen if two users enable checksums simultaneously. If the
> + * requested target is to disable checksums while they are being enabled,
> + * we must abort the current processing. This can happen if a user
> + * enables data checksums and then, before checksumming is done, disables
> + * data checksums again.
> + *
> + * If the launcher is in state B, we need to wait for processing to end
> + * and the abort flag be cleared before we can restart with the requested
> + * operation. Here we will exit immediately and leave it to the user to
> + * restart processing at a later time.
> + *
> + * If the launcher is in state C we can start performing the requested
> + * operation immediately.
> + */
> +
> + LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
> +
> + /*
> + * If the launcher is already started, the only operation we can perform
> + * is to cancel it iff the user requested for checksums to be disabled.
> + * That doesn't however mean that all other cases yield an error, as some
> + * might be perfectly benevolent.
> + */
> + if (DatachecksumsWorkerShmem->launcher_started)
> + {
> + if (DatachecksumsWorkerShmem->abort)
> + {
> + ereport(NOTICE,
> + (errmsg("data checksum processing is concurrently being aborted, please retry")));
> +
> + LWLockRelease(DatachecksumsWorkerLock);
> + return;
> + }
> +
> + /*
> + * If the launcher is started data checksums cannot be on or off, but
> + * it may be in an inprogress state. Since the state transition may
> + * not have happened yet (in case of rapidly initiated checksum enable
> + * calls for example) we inspect the target state of the currently
> + * running launcher.
> + */
> +

extra newline

> + if (enable_checksums)
> + {
> + /*
> + * If we are asked to enable checksums when they are already being
> + * enabled, there is nothing to do so exit.
> + */
> + if (DatachecksumsWorkerShmem->target)

Would "if (DatachecksumsWorkerShmem->target == enable_checksums)" maybe
be better, or does that change the meaning?

> + }
> +
> + /*
> + * The launcher is currently not running, so we need to query the system
> + * data checksum state to determine how to proceed based on the requested
> + * target state.
> + */
> + else
> + {

That's a slightly weird comment placement, maybe put it below the '{' so
that that the '} else {' is kept intact?

> + memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
> + DatachecksumsWorkerShmem->target = enable_checksums;

This one is especially confusing as it looks like we're unconditionally
enabling checksums but instead we're just setting the target based on
the bool (see above). It might be our standard notation though.

[...]

> +/*
> + * ProcessDatabase
> + * Enable data checksums in a single database.
> + *
> + * We do this by launching a dynamic background worker into this database, and
> + * waiting for it to finish. We have to do this in a separate worker, since
> + * each process can only be connected to one database during its lifetime.
> + */
> +static DatachecksumsWorkerResult
> +ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)

> + /*
> + * If the postmaster crashed we cannot end up with a processed database so
> + * we have no alternative other than exiting. When enabling checksums we
> + * won't at this time have changed the pg_control version to enabled so
> + * when the cluster comes back up processing will have to be resumed. When
> + * disabling, the pg_control version will be set to off before this so
> + * when the cluster comes up checksums will be off as expected. In the
> + * latter case we might have stale relhaschecksums flags in pg_class which
> + * need to be handled in some way. TODO

Any idea on how? Or is that for a future feature and can just be documented
for now? In any case, one can just run pg_disable_checksums() again as
mentioned elsewhere so maybe just rephrase the comment to say the admin
needs to do that?

(skipped the rest of this file for now)

> diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> index 59b3f4b135..ce933a9d08 100644
> --- a/src/backend/utils/init/postinit.c
> +++ b/src/backend/utils/init/postinit.c
> @@ -605,6 +605,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> if (MyBackendId > MaxBackends || MyBackendId <= 0)
> elog(FATAL, "bad backend ID: %d", MyBackendId);
>
> + /*
> + * Set up local cache of Controldata values.
> + */
> + InitLocalControldata();

This just sets LocalDataChecksumVersion for now, is it expected to cache
other ControlData values in the future? Maybe clarifying the current
state in the comment would be in order.

> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index 2779da8a69..d583a052e2 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
> {NULL, 0, false}
> };
>
> +/*
> + * Options for data_checksums enum.
> + */
> +static const struct config_enum_entry data_checksum_options[] = {
> + {"on", DATA_CHECKSUMS_ON, true},
> + {"off", DATA_CHECKSUMS_OFF, true},
> + {"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
> + {"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
> + {NULL, 0, false}
> +};
> +
> /*
> * Options for enum values stored in other modules
> */
> @@ -607,7 +620,7 @@ static int max_identifier_length;
> static int block_size;
> static int segment_size;
> static int wal_block_size;
> -static bool data_checksums;
> +static int data_checksums_tmp;

Why the _tmp, is that required for enum GUCs?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-01-05 20:34:42 Re: set_config() documentation clarification
Previous Message easteregg 2021-01-05 20:12:57 plpgsql variable assignment with union is broken