Re: Online checksums patch - once again

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Online checksums patch - once again
Date: 2020-09-23 12:34:36
Message-ID: 91DC9350-7269-4B0F-AF87-AE62243D53AD@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 19 Sep 2020, at 04:18, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

Thanks for reviewing!

> + * changed to "inprogress-off", the barrier for mvoving to "off" can be
> moving

Fixed.

> + * When disabling checksums, data_checksums will be set of "inprogress-off"
> set *to*

Fixed.

> + get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
>
> I think this palloc()s a new copy of the namespace every 100 blocks.
> Better do it outside the loop.

Good point, fixed.

> + {"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
> + {"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
>
> enabling / disabling ?

Perhaps, but it doesn't match the grammatical tense of others though?

> +typedef enum ChecksumType
> +{
> + DATA_CHECKSUMS_OFF = 0,
> + DATA_CHECKSUMS_ON,
> + DATA_CHECKSUMS_INPROGRESS_ON,
> + DATA_CHECKSUMS_INPROGRESS_OFF
> +} ChecksumType;
>
> Should this be an bitmask, maybe
> DATA_CHECKSUMS_WRITE = 1
> DATA_CHECKSUMS_VERIFY = 2,

That's an option, not sure if it would improve readability though. Anyone else
have opinions on that?

> It occured to me that you could rephrase this patch as "Make checksum state
> per-relation rather than cluster granularity". That's currently an
> implementation detail, but could be exposed as a feature.

Thats not entirely correct. The patch tracks checksum status *during
inprogress-on* with per-relation granularity, but as it stands it doesn't
support per-relation during state "on" in any way.

A per-relation checksum mode where every relation at any point can enable or
disable checksums would require a very different synchronization mechanism from
the all-or-nothing one (which while simpler, IMO is complicated enough). My
hope is that this patch brings solid infrastructure for anyone interested in
persuing per-relation checksums, but IMHO we should focus on getting
per-cluster rock-solid first.

> That could be a
> preliminary 0001 patch. Half the existing patch would be 0002 "Allow
> online enabling checksums for a given relation/database/cluster". You might
> save some of the existing effort of synchronize the cluster-wide checksum
> state, since it doesn't need to be synchronized.

I don't follow, how would a finer-grain resolution remove the need for
synchronization?

> The "data_checksums" GUC
> might be removed, or changed to an enum: on/off/per_relation. Moving from
> "per_relation" to "on" would be an optional metadata-only change, allowed only
> when all rels in all dbs are checksumed.

How would you guarantee that such a state change isn't happening concurrently
with a user doing ALTER TABLE .. checksums=off;? It would still require
synchronization along the lines of what this patch does unless I'm missing
something.

> I'm not sure if you'd even care about
> temp tables, since "relhaschecksum" would be authoritative, rather than a
> secondary bit only used during processing.
>
> XLogHintBitIsNeeded() and DataChecksumsEnabled() would need to check
> relhaschecksum, which tentatively seems possible.

While possible, that's a pretty hot codepath so any additional checking will
need proper benchmarking.

relhaschecksum isn't guaranteed to be correct at any point other than during
checksum enabling, it's only used for tracking progress in case of a cluster
restart during processing. To better convey this, I asked for suggestions for
better name upthread since relhaschecksum carries the risk of overpromise/
underdeliver. Perhaps relchecksumprocessed would be better?

A real per-relation relhaschecksum in pg_class would also need to solve how to
keep it accurate for offline enable/disable via pg_checksums.

> I'm not sure if it's possible, but maybe pg_checksums would be able to skip
> rels which had already been checksummed "online" (with an option to force
> reprocessing).

pg_checksums can't read the catalog state of the relations, so is has no
knowledge on where an online processing left off. That should probably be made
clearer in the docs though, so added a note on that.

> Maybe some people would want (no) checksums on specific tables, and that could
> eventually be implemented as 0003: "ALTER TABLE SET checksums=". I'm thinking
> of that being used immediately after an CREATE, but I suppose ON would cause
> the backend to rewrite the table with checksums synchronously (not in the BGW),
> either with AEL or by calling ProcessSingleRelationByOid().

Holding an AEL while changing state would be easier as it skips the need for
the complex synchronization, but is that really what users would expect?
Especially if cluster-wide enable is done transparent to the user.

More importantly though, what is the use-case for per-relation that we'd be
looking at solving? Discussing the implementation without framing it in an
actual use-case runs the risk of performing successful surgery where the
patient still dies. Performing ETL ingestion without the overhead of writing
checksums? Ephemeral data? Maybe unlogged tables can handle some situation
where checksums would be appealing?

While in the patch I realized that the relationlist saved the relkind but the
code wasn't actually using it, so I've gone ahead and removed that with a lot
fewer palloc calls as a result. The attached v22 fixes that and the above.

cheers ./daniel

Attachment Content-Type Size
online_checksums22.patch application/octet-stream 109.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-09-23 13:12:46 Re: problem with RETURNING and update row movement
Previous Message Amit Kapila 2020-09-23 12:22:07 Re: [Patch] Optimize dropping of relation buffers using dlist