Re: Online checksums patch - once again

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Online checksums patch - once again
Date: 2020-11-25 13:20:42
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 17/11/2020 10:56, Daniel Gustafsson wrote:
>>> On 5 Oct 2020, at 13:36, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> 2. The signaling between enable_data_checksums() and the launcher process looks funny to me. The general idea seems to be that enable_data_checksums() just starts the launcher process, and the launcher process figures out what it need to do and makes all the changes to the global state. But then there's this violation of the idea: enable_data_checksums() checks DataChecksumsOnInProgress(), and tells the launcher process whether it should continue a previously crashed operation or start from scratch. I think it would be much cleaner if the launcher process figured that out itself, and enable_data_checksums() would just tell the launcher what the target state is.
>>> enable_data_checksums() and disable_data_checksums() seem prone to race conditions. If you call enable_data_checksums() in two backends concurrently, depending on the timing, there are two possible outcomes:
>>> a) One call returns true, and launches the background process. The other call returns false.
>>> b) Both calls return true, but one of them emits a "NOTICE: data checksums worker is already running".
>>> In disable_data_checksum() imagine what happens if another backend calls enable_data_checksums() in between the ShutdownDatachecksumsWorkerIfRunning() and SetDataChecksumsOff() calls.
>> I've reworked this in the attached such that the enable_ and disable_ functions
>> merely call into the launcher with the desired outcome, and the launcher is
>> responsible for figuring out the rest. The datachecksumworker is now the sole
>> place which initiates a state transfer.
> Well, you still fill the DatachecksumsWorkerShmem->operations array in the backend process that launches the datacheckumworker, not in the worker process. I find that still a bit surprising, but I believe it works.

I'm open to changing it in case there are strong opinions, it just seemed the
most natural to me.

> pg_enable/disable_data_checksums() should perform a superuser-check. I don't think we want to expose them to any users.


>> +/*
>> + * Local state for Controlfile data_checksum_version. After initialization,
>> + * this is only updated when absorbing a procsignal barrier during interrupt
>> + * processing. Thus, it can be read by backends without the need for a lock.
>> + * Possible values are the checksum versions defined in storage/bufpage.h and
>> + * zero for when checksums are disabled.
>> + */
>> +static uint32 LocalDataChecksumVersion = 0;
> The comment is a bit confusing: "Thus, it can be read by backends without the need for a lock". Since it's a variable in backend-private memory, it can only be read by the same backend, not "backends". And that's also why you don't need a lock, not because it's updated during interrupt processing. I understand how this works, but maybe rephrase the comment a bit.


>> +/*
>> + * DataChecksumsOnInProgress
>> + * Returns whether data checksums are being enabled
>> + *
>> + * Most callsites shouldn't need to worry about the "inprogress" states, since
>> + * they should check the requirement for verification or writing. Some low-
>> + * level callsites dealing with page writes need however to know. It's also
>> + * used to check for aborted checksum processing which need to be restarted.
>> */
>> bool
>> -DataChecksumsEnabled(void)
>> +DataChecksumsOnInProgress(void)
>> +{
>> + return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
>> +}
> s/need/needs/. The whole paragraph feels a bit awkwardly worded in general. I'd suggest something like: "Most operations don't need to worry about the "inprogress" states, and should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite()". DataChecksumsOffInProgress() is called from StartDatachecksumsWorkerLauncher(), which I wouldn't call a "low-level callsite".

Fixed, and a few related surrounding comments expanded and tweaked.

>> @@ -1079,7 +1091,7 @@ XLogInsertRecord(XLogRecData *rdata,
>> Assert(RedoRecPtr < Insert->RedoRecPtr);
>> RedoRecPtr = Insert->RedoRecPtr;
>> }
>> - doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
>> + doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
>> if (doPageWrites &&
>> (!prevDoPageWrites ||
> The comment above this deserves to be updated.


> Also, this is a very hot codepath; will this slow down WAL-logging, when full-page writes are disabled? Could we inline DataChecksumsOnInProgress() or set Insert->forcePageWrites when checksums are being computed or something?

Wouldn't setting forcePageWrites risk causing other side effects that we don't
want here? I've changed DataChecksumsOnInProgress to inline as a start.

> In StartupXLOG():
>> + /*
>> + * If data checksums were being disabled when the cluster was shutdown, we
>> + * know that we have a state where all backends have stopped validating
>> + * checksums and we can move to off instead.
>> + */
>> + if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
>> + {
>> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>> + ControlFile->data_checksum_version = 0;
>> + LWLockRelease(ControlFileLock);
>> + }
>> +
> Should this be WAL-logged, like in SetDataChecksumsOff()?

My initial thinking was that we wouldn't need to, as all nodes would process
the controlfile equally. The more I think about it the more I think we need to
have an XLOG record of it. Added.

It would be good to stress this in a TAP test, but I haven't been able to write
one yet.

> In SetDataChecksumsOff():
>> + ControlFile->data_checksum_version = 0;
>> + UpdateControlFile();
>> + LWLockRelease(ControlFileLock);
>> +
>> + XlogChecksums(0);
>> + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
>> +}
> What happens is if you crash between UpdateControlFile() and XlogChecksum()?

Good point, that would not get the cluster to a consistent state. The
XlogChecksum should be performed before controlfile is udpated.

The attached patch contains these fixes as well as a rebase on top of todays

cheers ./daniel

Attachment Content-Type Size
online_checksums24.patch application/octet-stream 124.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2020-11-25 13:23:47 Re: [PATCH] LWLock self-deadlock detection
Previous Message Masahiko Sawada 2020-11-25 13:20:02 Re: Add Information during standby recovery conflicts