Race conditions in logical decoding

From: Antonin Houska <ah(at)cybertec(dot)at>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Race conditions in logical decoding
Date: 2026-01-19 16:29:25
Message-ID: 85833.1768840165@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A stress test [1] for the REPACK patch [1] revealed data
corruption. Eventually I found out that the problem is in postgres core. In
particular, it can happen that a COMMIT record is decoded, but before the
commit could be recorded in CLOG, a snapshot that takes the commit into
account is created and even used. Visibility checks then work incorrectly
until the CLOG gets updated.

In logical replication, the consequences are not only wrong data on the
subscriber, but also corrutped table on publisher - this is due to incorrectly
set commit hint bits.

Attached is a spec file that demonstrates the issue. I did not add it to
Makefile because I don't expect the current version to be merged (see the
commit message for details.

I'm not sure yet how to fix the problem. I tried to call XactLockTableWait()
from SnapBuildAddCommittedTxn() (like it happens in SnapBuildWaitSnapshot()),
but it made at least one regression test (subscription/t/010_truncate.pl)
stuck - probably a deadlock. I can spend more time on it, but maybe someone
can come up with a good idea sooner than me.

[1] https://www.postgresql.org/message-id/CADzfLwU78as45To9a%3D-Qkr5jEg3tMxc5rUtdKy2MTv4r_SDGng%40mail.gmail.com
[2] https://commitfest.postgresql.org/patch/5117/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
0001-Demonstrate-possible-race-conditions-in-logical-decoding.patch text/x-diff 11.6 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2026-01-19 16:31:19 Re: Adding REPACK [concurrently]
Previous Message Tom Lane 2026-01-19 16:21:42 Re: tablecmds: clarify recurse vs recusing