Re: Online checksums patch - once again

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online checksums patch - once again
Date: 2021-01-27 15:37:52
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Revisiting an issue we discussed earlier:

On 25/11/2020 15:20, Daniel Gustafsson wrote:
> On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> On 17/11/2020 10:56, Daniel Gustafsson wrote:
>>> I've reworked this in the attached such that the enable_ and
>>> disable_ functions merely call into the launcher with the desired
>>> outcome, and the launcher is responsible for figuring out the
>>> rest. The datachecksumworker is now the sole place which
>>> initiates a state transfer.
>> Well, you still fill the DatachecksumsWorkerShmem->operations array
>> in the backend process that launches the datacheckumworker, not in
>> the worker process. I find that still a bit surprising, but I
>> believe it works.
> I'm open to changing it in case there are strong opinions, it just
> seemed the most natural to me.

This kept bothering me, so I spent a while hacking this to my liking.
The attached patch moves the code to fill in 'operations' from the
backend to the launcher, so that the pg_enable/disable_checksums() call
now truly just stores the desired state, checksum on or off, in shared
memory, and launches the launcher process. The launcher process figures
out how to get to the desired state. This removes the couple of corner
cases that previously emitted a NOTICE about the processing being
concurrently disabled or aborted. What do you think? I haven't done much
testing, so if you adopt this approach, please check if I broke
something in the process.

This changes the way the abort works. If the
pg_enable/disable_checksums() is called, while the launcher is already
busy changing the state, pg_enable/disable_checksums() will just set the
new desired state in shared memory anyway. The launcher process will
notice that the target state changed some time later, and restart from

A couple of other issues came up while doing that:

- AbortProcessing() has two callers, one in code that runs in the
launcher process, and another one in code that runs in the worker
process. Is it really safe to use from the worker process? Calling
ProcessAllDatabases() in the worker seems sketchy. (This is moot in this
new patch version, as I removed AbortProcessing() altogether)

- Is it possible that the worker keeps running after the launcher has
already exited, e.g. because of an ERROR or SIGTERM? If you then quickly
call pg_enable_checksums() again, can you end up with two workers
running at the same time? Is that bad?

On 26/01/2021 23:00, Daniel Gustafsson wrote:
>> On 22 Jan 2021, at 12:55, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> @@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname,
>>> relkind == RELKIND_MATVIEW)
>>> RelationInitTableAccessMethod(rel);
>>> + /*
>>> + * Set the data checksum state. Since the data checksum state can change at
>>> + * any time, the fetched value might be out of date by the time the
>>> + * relation is built. DataChecksumsNeedWrite returns true when data
>>> + * checksums are: enabled; are in the process of being enabled (state:
>>> + * "inprogress-on"); are in the process of being disabled (state:
>>> + * "inprogress-off"). Since relhaschecksums is only used to track progress
>>> + * when data checksums are being enabled, and going from disabled to
>>> + * enabled will clear relhaschecksums before starting, it is safe to use
>>> + * this value for a concurrent state transition to off.
>>> + *
>>> + * If DataChecksumsNeedWrite returns false, and is concurrently changed to
>>> + * true then that implies that checksums are being enabled. Worst case,
>>> + * this will lead to the relation being processed for checksums even though
>>> + * each page written will have them already. Performing this last shortens
>>> + * the window, but doesn't avoid it.
>>> + */
>>> + rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
>>> +
>>> /*
>>> * Okay to insert into the relcache hash table.
>>> *
>> I grepped for relhashcheckums, and concluded that the value in the
>> relcache isn't actually used for anything. Not so! In
>> heap_create_with_catalog(), the actual pg_class row is constructed
>> from the relcache entry, so the value set in
>> RelationBuildLocalRelation() finds its way to pg_class. Perhaps it
>> would be more clear to pass relhachecksums directly as an argument
>> to AddNewRelationTuple(). That way, the value in the relcache would
>> be truly never used.
> I might be thick (or undercaffeinated) but I'm not sure I follow.
> AddNewRelationTuple calls InsertPgClassTuple which in turn avoids the
> relcache entry.

Ah, you're right, I misread AddNewRelationTuple. That means that the
relhaschecksums field in the relcache is never used? That's a clear
rule. The changes to formrdesc() and RelationBuildLocalRelation() seem
unnecessary then, we can always initialize relhaschecksums to false in
the relcache.

- Heikki

Attachment Content-Type Size
v34-0001-Support-checksum-enable-disable-in-a-running-clu.patch text/x-patch 136.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-01-27 16:22:14 Inconsistent function definitions in ECPG's informix.c
Previous Message Amit Khandekar 2021-01-27 15:06:28 Re: Speeding up GIST index creation for tsvectors