| From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
|---|---|
| To: | "Maksim(dot)Melnikov" <m(dot)melnikov(at)postgrespro(dot)ru> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Incorrect checksum in control file with pg_rewind test |
| Date: | 2026-04-21 12:12:58 |
| Message-ID: | CAPpHfdsXkEWUeLUG4zh9q=hjpsOCMgsbN_XZh-6JL0z1NaNMqQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi, Maksim!
On Fri, Nov 7, 2025 at 5:19 PM Maksim.Melnikov
<m(dot)melnikov(at)postgrespro(dot)ru> wrote:
> just to clarify, it isn't pg_rewind related issue and can fire
> spontaneously.
> I don't have any strong scenario how to reproduce it, tests sometimes
> fired on our local CI, but as you can see on thread [1],
> where the same issue for frontends was discussed, it is very hard to
> reproduce and there wasn't scenario how to do it too.
>
> Some dirty hacks to reproduce it was described here [2], and I've tried
> it on master branch:
> First of all I applied patch
> 0001-XXX-Dirty-hack-to-clobber-control-file-for-testing.patch from [2],
> then compile app with
> -DEXEC_BACKEND and exec command in psql
> do $$ begin loop perform pg_update_control_file(); end loop; end; $$;
> Also I've run pgbench command
> for run in {1..5000}; do pgbench -c50 -t100 -j6 -S postgres ; done
> And eventually got error
>
> 2025-11-07 17:58:33.139 MSK [2472504] FATAL: incorrect checksum in
> control file
> 2025-11-07 17:58:33.141 MSK [2472501] LOG: could not receive data from
> client: Connection reset by peer
> 2025-11-07 17:58:33.143 MSK [2472505] LOG: could not send data to
> client: Broken pipe
> 2025-11-07 17:58:33.143 MSK [2472505] FATAL: connection to client lost
Thank you for spotting this issue and proposing a patch. The fork
builds don't have this problem, because fork replicated contents of
LocalControlFile to the new process. And the postmaster has
consistent snapshot of control file as there is no concurrent process
which could write it and that moment. But EXEC_BACKEND, even with
your patch, may end up different processes with different contents of
LocalControlFile. I don't see it could cause a material bug right
now, but I see this as undesirable divergence between fork and
EXEC_BACKEND behaviors. I propose an alternative approach copy the
contents of control file to the new process via BackendParameters.
This approach solves two problems at once: no torn reads, and no
divergence between fork and EXEC_BACKEND.
------
Regards,
Alexander Korotkov
Supabase
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Inherit-pg_control-snapshot-into-EXEC_BACKEND-sub.patch | application/octet-stream | 11.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ashutosh Bapat | 2026-04-21 12:19:45 | Re: [Bug][patch]: After dropping the last label from a property graph element, invoking pg_get_propgraphdef() triggers an assertion failure |
| Previous Message | lakshmi | 2026-04-21 11:58:19 | Re: ECPG: inconsistent behavior with the document in “GET/SET DESCRIPTOR.” |