Re: Changing the state of data checksums in a running cluster

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Changing the state of data checksums in a running cluster
Date: 2026-05-28 11:51:14
Message-ID: 538e820b-db2a-4f53-ba24-c354c72fc1a9@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/28/26 13:28, Daniel Gustafsson wrote:
>> On 26 May 2026, at 20:12, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
>> I suppose this means we should not be updating the checksum state
>> without emitting the barrier? I think all other places do that.
>
> Good catch, it's indeed a bug, any state change must emit a procsignalbarrier
> to maintain cluster consistency. I ended up writing a test for this very case
> as well.
>

Good.

>> I'm still not sure if it really is an issue or just an annoyance,
>> because I've not been able to find a case where it'd lead to checksum
>> failures (or obviously incorrect final state after recovery).
>
> I've tried to get it to reach an incorrect end state but failed, but I do agree
> that maybe we need an improved locking protocol around state updates. Need to
> spend some more time thinking about this.
>

OK

>> I still don't understand why this needs DELAY_CHKPT_START ...
>
> Having stared at this for some time, and going over old threads, I think this
> is a mistake. AFAICT though it cannot cause any error, so I'd lean towards
> erring on the safe side by leaving as is and looking at removing in 20. What
> do you think?
>

I'd probably try to fix this for 19, otherwise it may be confusing
people looking at the code in the future. We're still months from 19
getting released. Ofc, maybe I'm underestimating the risk.

>> I also noticed a couple minor comment issues, per attached patch (this
>> may need pgindent).
>
> I ended up splitting this into two, one for the comment fixes and one for the
> data type change.
>
> I propose applying the three patches below to v19 to fix the promotion issue
> before we wrap beta1.
>

WFM

> --
> Daniel Gustafsson
>

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-05-28 12:03:14 Re: postgres_fdw, dblink: Validate use_scram_passthrough values
Previous Message Peter Eisentraut 2026-05-28 11:49:10 Re: Heads Up: cirrus-ci is shutting down June 1st