Re: Changing the state of data checksums in a running cluster

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Changing the state of data checksums in a running cluster
Date: 2025-08-27 08:30:18
Message-ID: 02DCCE05-537F-4BDE-9C0B-D4935021BBBF@yesql.se
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 26 Aug 2025, at 01:06, Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> I think this TAP looks very nice, but there's a couple issues with it.
> See the attached patch fixing those.

Thanks, I have incorporated (most of) your patch in the attached. I did keep
the PG_TEST_EXTRA check for injection points though which I assume were removed
out of mistake.

> With these changes it runs for me, and I even saw some
>
> LOG: page verification failed
>
> in tmp_check/log/006_concurrent_pgbench_standby_1.log. But it takes a
> while - a couple minutes, maybe? I think I saw it at
>
> t/006_concurrent_pgbench.pl .. 427/?

That's very interesting, I have been running it to timeout several times in a
row without hitting any verification failures. Will keep running.

> or something like that. I think the bash version did a couple things
> differently, which might make the failures more frequent (but it's just
> a wild guess).
>
> In particular, I think the script restarts the two nodes independently,
> while the TAP always stops both primary and standby, in this order. I
> think it'd be useful to restart one or both.

Done in the attached, it will now randomly stop one or both or none. If the
node is stopped I've added an offline pg_checksum step to validate the
datafiles as a why-not test.

> The other thing is the bash script added some random delays/sleep, which
> increases the test duration, but it also means generating somewhat
> random amounts of data, etc. It also randomized some other stuff (scale,
> client count, ...). But that can wait.

Added as well in a few places, maybe more can be sprinkled in.

--
Daniel Gustafsson

Attachment Content-Type Size
v20250827-0001-Online-enabling-and-disabling-of-data-chec.patch application/octet-stream 179.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mihail Nikalayeu 2025-08-27 08:53:57 Re: [BUG?] check_exclusion_or_unique_constraint false negative
Previous Message Mihail Nikalayeu 2025-08-27 08:22:24 Re: Adding REPACK [concurrently]