Re: Online enabling of checksums

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Online enabling of checksums
Date: 2018-03-19 06:27:16
Message-ID: 6f8675d8-c4db-1fae-6de4-4a0857421c9b@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

The patch looks good to me at a high level. Some notes below. I didn't
read through the whole thread, so sorry if some of these have been
discussed already.

> +void
> +ShutdownChecksumHelperIfRunning(void)
> +{
> + if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
> + {
> + /* Launcher not started, so nothing to shut down */
> + return;
> + }
> +
> + ereport(ERROR,
> + (errmsg("checksum helper is currently running, cannot disable checksums"),
> + errhint("Restart the cluster or wait for the worker to finish.")));
> +}

Is there no way to stop the checksum helper once it's started? That
seems rather user-unfriendly. I can imagine it being a pretty common
mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to
realize that you forgot to set the cost limit, and that it's hurting
queries too much. At that point, you want to abort.

> + * This is intended to create the worklist for the workers to go through, and
> + * as we are only concerned with already existing databases we need to ever
> + * rebuild this list, which simplifies the coding.

I can't parse this sentence.

> +extern bool DataChecksumsEnabledOrInProgress(void);
> +extern bool DataChecksumsInProgress(void);
> +extern bool DataChecksumsDisabled(void);

I find the name of the DataChecksumsEnabledOrInProgress() function a bit
long. And doing this in PageIsVerified looks a bit weird:

> if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()

> + <literal>template0</literal> is by default not accepting connections, to
> + enable checksums you'll need to temporarily make it accept connections.

This was already discussed, and I agree with the other comments that
it's bad.

> +CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
> + cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
> + RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
> + PARALLEL RESTRICTED;

pg_[enable|disable]_checksums() functions return a bool. It's not clear
to me when they would return what. I'd suggest marking them as 'void'
instead.

> --- a/src/include/storage/bufpage.h
> +++ b/src/include/storage/bufpage.h
> @@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
> */
> #define PG_PAGE_LAYOUT_VERSION 4
> #define PG_DATA_CHECKSUM_VERSION 1
> +#define PG_DATA_CHECKSUM_INPROGRESS_VERSION 2
>

This seems like a weird place for these PG_DATA_CHECKSUM_* constants.
They're not actually stored in the page header, as you might think. I
think the idea was to keep PG_DATA_CHECKSUM_VERSION close to
PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk
format. I think it was a bit weird even before this patch, but now it's
worse. At least a better comment would be in order, or maybe move these
elsewhere.

Feature request: it'd be nice to update the 'ps status' with
set_ps_display, to show a rudimentary progress indicator. Like the name
and block number of the relation being processed. It won't tell you how
much is left to go, but at least it will give a warm fuzzy feeling to
the DBA that something is happening.

I didn't review the offline tool, pg_verify_checksums.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2018-03-19 06:55:06 Re: Hash join in SELECT target list expression keeps consuming memory
Previous Message amul sul 2018-03-19 06:18:36 Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key