Re: Online checksums patch - once again

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online checksums patch - once again
Date: 2020-06-22 12:27:47
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 23 Jan 2020, at 18:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> ..That's probably easier and more efficient if it's just value
> in pg_class than if they have to go poking around in another catalog.
> So I am tentatively inclined to think that just putting it in pg_class
> makes more sense.

..which is what I did, but more on that later.

Attached is a new version of the online checksums patch which, I hope, address
most of the concerns raised in previous reviews. There has been a fair amount
of fiddling done, so below is a summary of what has been done.

Error handling and synchronization around pg_control have been overhauled as
well as the absorption of the process barriers. The comment there suggested
that the absorbing function shouldn't reside with the procsignal code, input is
gladly received on where it makes the most sense since I can see merit to quite
a few places.

The checksumhelper is renamed datachecksumsworker, since checksumhelper is now
used since c12e43a2e0d45a6b59f2. I think there is room for better casing here
and there on this.

Restartability is implemented by keeping state in pg_class. I opted for a bool
which is cleared as the first step of checksum enable, since it offers fewer
synchronization cornercases I think. The field is only useful during
processing, and is not guaranteed to reflect reality outside of processing.
The current name I came up with does not convey that, better suggestions are
more than welcome. For now, the process must be restarted manually by running
pg_enable_data_checksums() again, which I sort of like but that might just be
Stockholm syndrome from having enabled/disabled checksums locally a gazillion

Testing has been extended to cover basics but also restartability. Testing a
resumed restart is a tad tricky while still avoiding timing related tests, so
I've (possibly ab-)used an interactive psql session to act as a blocker to keep
processing from finishing. I did extend poll_query_until to take a non-default
timeout to make sure these tests finish in a reasonable time during hacking,
but thats left out of this version.

There are a few TODO markers left where I'd appreciate input from reviewers,
for example what to do for disabling already disabled checksums (is it a LOG,
NOTICE, ERROR or silent return?)

This is an executive summary of hacking done, and I have most likely forgotten
to mention something important, but I hope it covers most things. Few, if any,
changes are made to the interface of this, changes are contained under the
hood. I will stick this patch in the upcoming commitfest.

cheers ./daniel

Attachment Content-Type Size
online_checksums18.patch application/octet-stream 98.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-06-22 13:02:51 Re: min_safe_lsn column in pg_replication_slots view
Previous Message Tomas Vondra 2020-06-22 12:14:45 Re: [PATCH] Initial progress reporting for COPY command