From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
Cc: | Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Changing the state of data checksums in a running cluster |
Date: | 2025-08-29 14:38:22 |
Message-ID: | d9ea8a27-ed46-476f-8a6e-600147b13ff2@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 8/29/25 16:26, Tomas Vondra wrote:
> ...
>
> I've seen these failures after changing checksums in both directions,
> both after enabling and disabling. But I've only ever saw this after
> immediate shutdown, never after fast shutdown. (It's interesting the
> pg_checksums failed only after fast shutdowns ...).
>
Of course, right after I send a message, it fails after a fast shutdown,
contradicting my observation ...
> Could it be that the redo happens to start from an older position, but
> using the new checksum version?
>
... but it also provided more data supporting this hypothesis. I added
logging of pg_current_wal_lsn() before / after changing checksums on the
primary, and I see this:
1) LSN before: 14/2B0F26A8
2) enable checksums
3) LSN after: 14/EE335D60
4) standby waits for 14/F4E786E8 (higher, likely thanks to pgbench)
5) standby restarts with -m fast
6) redo starts at 14/230043B0, which is *before* enabling checksums
I guess this is the root cause. A bit more detailed log attached.
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
failure2.log | text/x-log | 3.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | cca5507 | 2025-08-29 14:40:25 | Unused parameter in ProcessSlotSyncInterrupts() |
Previous Message | Joel Jacobson | 2025-08-29 14:38:05 | Re: Assert single row returning SQL-standard functions |