Re: Online checksums patch - once again

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <jrouhaud(at)vmware(dot)com>
Subject: Re: Online checksums patch - once again
Date: 2020-07-28 02:33:46
Message-ID: 20200728023346.GA20393@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 25, 2020 at 11:43:00AM +0200, Daniel Gustafsson wrote:
> The attached v19 fixes a few doc issues I had missed.

+ They can also be enabled or disabled at a later timne, either as an offline
=> time

+ * awaiting shutdown, but we can continue turning off checksums anyway
=> a waiting

+ * We are starting a checksumming process scratch, and need to start by
=> FROM scratch

+ * to inprogress new relations will set relhaschecksums in pg_class so it
=> inprogress COMMA

+ * Relation no longer exist. We don't consider this an error since
=> exists

+ * so when the cluster comes back up processing will habe to be resumed.
=> have

+ "completed one pass over all databases for checksum enabling, %i databases processed",
=> I think this will be confusing to be hardcoded "one". It'll say "one" over
and over.

+ * still exist.
=> exists

In many places, you refer to "datachecksumsworker" (sums) but in nine places
you refer to datachecksumworker (sum).

+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+ BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);

=> I think looping over numblocks is safe since new blocks are intended to be
written with checksum, right? Maybe it's good to say that here.

+ BlockNumber b;

blknum will be easier to grep for

+ (errmsg("background worker \"datachecksumsworker\" starting for database oid %d",
=> Should be %u or similar (several of these)

Some questions:

It looks like you rewrite every page, even if it already has correct checksum,
to handle replicas. I wonder if it's possible/reasonable/good to skip pages
with correct checksum when wal_level=minimal ?

It looks like it's not possible to change the checksum delay while a checksum
worker is already running. That may be important to allow: 1) decreased delay
during slow periods; 2) increased delay if the process is significantly done,
but needs to be throttled to avoid disrupting production environment.

Have you collaborated with Julien about this one? His patch adds new GUCs:
https://www.postgresql.org/message-id/20200714090808.GA20780@nol
checksum_cost_delay
checksum_cost_page
checksum_cost_limit

Maybe you'd say that Julien's pg_check_relation() should accept parameters
instead of adding GUCs. I think you should be in agreement on that. It'd be
silly if the verification function added three GUCs and allowed adjusting
throttle midcourse, but the checksum writer process didn't use them.

If you used something like that, I guess you'd also want to distinguish
checksum_cost_page_read vs write. Possibly, the GUCs part should be a
preliminary shared patch 0001 that you both used.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-07-28 03:01:45 Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.
Previous Message Fujii Masao 2020-07-28 02:25:54 Re: max_slot_wal_keep_size and wal_keep_segments