| From: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | [BUG] Race in online checksums launcher_exit() |
| Date: | 2026-04-19 20:09:51 |
| Message-ID: | CAJTYsWWg6tFrdMhWs5PkwESTNeeUUsMuY17O4UmPPh771c3stA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
While using the pg_enable_data_checksums() feature, I found a likely bug, a
race condition in datachecksum_state.c's launcher_exit().
When pg_enable_data_checksums() is called twice before the first launcher
starts, two bg workers are registered (the code expects this). The
redundant launcher exits early, but it's launcher_exit() callback
unconditionally clears the shared launcher_running flag and may call
SetDataChecksumsOff() -- even though it never owned the flag.
This allows a third pg_enable_data_checksums() call to launch another
launcher concurrently with the first (duplicate work, doubled I/O, spurious
warnings). Worse, if the redundant launcher initialized after the winner
transitioned to inprogress-on, its exit handler calls
SetDataChecksumsOff(), silently aborting the enable operation. (I have
not triggered the SetDataChecksumsOff part though calling out ad it can be
a likely scenario based on timing of workers)
Reproduced by firing three calls in quick succession:
psql -c "SELECT pg_enable_data_checksums();" &
psql -c "SELECT pg_enable_data_checksums();" &
sleep 0.5
psql -c "SELECT pg_enable_data_checksums();" &
Log shows two launchers processing databases concurrently:
[2093292] LOG: enabling data checksums requested
[2093293] LOG: already running, exiting
[2093299] LOG: enabling data checksums requested -- third launcher
admitted
[2093292] LOG: processing database "postgres"
[2093299] LOG: processing database "postgres" -- same DB,
concurrently
[2093299] WARNING: cannot set data checksums to "on", current state is
not "inprogress-on"
I think the process-local launcher_running flag exists for this purpose and
is already used for the worker-kill block, but the flag-clear and
state-revert blocks do not use it.
The attached patch returns early from launcher_exit() when the local flag
is false. Thoughts?
Regards,
Ayush
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-race-in-online-checksums-launcher_exit.patch | application/octet-stream | 2.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Paul A Jungwirth | 2026-04-19 20:10:06 | Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column |
| Previous Message | Andres Freund | 2026-04-19 19:25:42 | Re: First draft of PG 19 release notes |