Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: ocean_li_996 <ocean_li_996(at)163(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, cca5507(at)qq(dot)com
Subject: Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT
Date: 2026-01-28 03:32:41
Message-ID: CAFPTHDagn1PProB2RGM-0tOt2D4BYjpxsoRVO0sn-bLAvXg+mQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 22, 2025 at 8:28 PM ocean_li_996 <ocean_li_996(at)163(dot)com> wrote:
>
> Hi all,
>
> I would like to share a logical replication bug and some possible fixes. It seems that this bug has existed since
> logical replication was first introduced, so it has been around for quite some time. In fact, the previously
> reported issues [1], [2], [3] were all caused by this bug.
>
> # Problem description
>
> When in the BUILDING_SNAPSHOT state, the snapshot builder does not track the status of any
> transaction. It can lead to missing transaction states when:
> -- The transaction commits before the builder reaches FULL_SNAPSHOT state, and
> -- The transaction's xid is greater than or equal to builder->xmin when the builder reaches
> FULL_SNAPSHOT state.

> 2) Based on v6-0001, I have provided a minimal fix in v6-0003 (not yet reviewed). AFAICS, it resolves
> the problem, though it records additional useless information in the reorder buffer during BUILDING_SNAPSHOT
> state (which is discarded later). This increases memory usage and slightly impacts performance. But since
> snapshot building is infrequent, I consider this acceptable.
>
> 3) I have also prepared a cleaner and more efficient fix in v6-0004 than v6-0003, albeit more complex
> (similar to v6-0001). Provided as an alternative reference.

Hello Haiyang,

I agree with your analysis and approach, but when I tried out the
patch (applying patch 0002 for the tests and patch 0004), I see the
tests in contrib/test_decoding failing.
Similarly, applying patch 0002 and 0003 also results in the tests
failing. So, I am not sure how your minimal fix fixes the problem. Am
I doing something wrong?
Does patch 0003 and 0004 have to be applied on top of 0001? That
doesn't seem to be the case, as both make the same code change and
don't apply cleanly.

regards,
Ajin Cherian
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-01-28 04:13:21 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Euler Taveira 2026-01-28 03:01:39 Re: pg_waldump: support decoding of WAL inside tarfile