| From: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
|---|---|
| To: | Tomas Vondra <tomas(at)vondra(dot)me> |
| Cc: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net> |
| Subject: | Re: Changing the state of data checksums in a running cluster |
| Date: | 2026-05-28 11:28:49 |
| Message-ID: | FAE6FC0E-AA0B-4CF4-B49B-BA6C2FC55FB8@yesql.se |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On 26 May 2026, at 20:12, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> I suppose this means we should not be updating the checksum state
> without emitting the barrier? I think all other places do that.
Good catch, it's indeed a bug, any state change must emit a procsignalbarrier
to maintain cluster consistency. I ended up writing a test for this very case
as well.
> I'm still not sure if it really is an issue or just an annoyance,
> because I've not been able to find a case where it'd lead to checksum
> failures (or obviously incorrect final state after recovery).
I've tried to get it to reach an incorrect end state but failed, but I do agree
that maybe we need an improved locking protocol around state updates. Need to
spend some more time thinking about this.
> I still don't understand why this needs DELAY_CHKPT_START ...
Having stared at this for some time, and going over old threads, I think this
is a mistake. AFAICT though it cannot cause any error, so I'd lean towards
erring on the safe side by leaving as is and looking at removing in 20. What
do you think?
> I also noticed a couple minor comment issues, per attached patch (this
> may need pgindent).
I ended up splitting this into two, one for the comment fixes and one for the
data type change.
I propose applying the three patches below to v19 to fix the promotion issue
before we wrap beta1.
--
Daniel Gustafsson
| Attachment | Content-Type | Size |
|---|---|---|
| 0003-Use-correct-datatype-for-PID.patch | application/octet-stream | 1.2 KB |
| 0002-Improve-comments-in-online-checksums-code.patch | application/octet-stream | 7.1 KB |
| 0001-Fix-checksum-state-transition-during-promotion.patch | application/octet-stream | 5.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2026-05-28 11:49:10 | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Previous Message | Nisha Moond | 2026-05-28 11:28:31 | Re: Support EXCEPT for TABLES IN SCHEMA publications |