[BUG] Race in online checksums launcher_exit()

From: Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [BUG] Race in online checksums launcher_exit()
Date: 2026-04-19 20:09:51
Message-ID: CAJTYsWWg6tFrdMhWs5PkwESTNeeUUsMuY17O4UmPPh771c3stA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

While using the pg_enable_data_checksums() feature, I found a likely bug, a
race condition in datachecksum_state.c's launcher_exit().

When pg_enable_data_checksums() is called twice before the first launcher
starts, two bg workers are registered (the code expects this). The
redundant launcher exits early, but it's launcher_exit() callback
unconditionally clears the shared launcher_running flag and may call
SetDataChecksumsOff() -- even though it never owned the flag.

This allows a third pg_enable_data_checksums() call to launch another
launcher concurrently with the first (duplicate work, doubled I/O, spurious
warnings). Worse, if the redundant launcher initialized after the winner
transitioned to inprogress-on, its exit handler calls
SetDataChecksumsOff(), silently aborting the enable operation. (I have
not triggered the SetDataChecksumsOff part though calling out ad it can be
a likely scenario based on timing of workers)

Reproduced by firing three calls in quick succession:

psql -c "SELECT pg_enable_data_checksums();" &
psql -c "SELECT pg_enable_data_checksums();" &
sleep 0.5
psql -c "SELECT pg_enable_data_checksums();" &

Log shows two launchers processing databases concurrently:

[2093292] LOG: enabling data checksums requested
[2093293] LOG: already running, exiting
[2093299] LOG: enabling data checksums requested -- third launcher
admitted
[2093292] LOG: processing database "postgres"
[2093299] LOG: processing database "postgres" -- same DB,
concurrently
[2093299] WARNING: cannot set data checksums to "on", current state is
not "inprogress-on"

I think the process-local launcher_running flag exists for this purpose and
is already used for the worker-kill block, but the flag-clear and
state-revert blocks do not use it.

The attached patch returns early from launcher_exit() when the local flag
is false. Thoughts?

Regards,
Ayush

Attachment Content-Type Size
0001-Fix-race-in-online-checksums-launcher_exit.patch application/octet-stream 2.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul A Jungwirth 2026-04-19 20:10:06 Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column
Previous Message Andres Freund 2026-04-19 19:25:42 Re: First draft of PG 19 release notes