RE: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: ocean_li_996 <ocean_li_996(at)163(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: "cca5507(at)qq(dot)com" <cca5507(at)qq(dot)com>
Subject: RE: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT
Date: 2026-01-27 02:39:11
Message-ID: TY4PR01MB16907DEEC21EA1D8CE5C280809490A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, November 22, 2025 5:28 PM ocean_li_996 <ocean_li_996(at)163(dot)com> wrote:
> # Problem description
>
> When in the BUILDING_SNAPSHOT state, the snapshot builder does not track the
> status of any transaction. It can lead to missing transaction states when: --
> The transaction commits before the builder reaches FULL_SNAPSHOT state, and --
> The transaction's xid is greater than or equal to builder->xmin when the builder
> reaches FULL_SNAPSHOT state.
>
> Once in FULL_SNAPSHOT state, the builder constructs a base snapshot using
> incomplete transaction state information. This results in an incorrect base
> snapshot, which can cause unpredictable behavior during subsequent decoding. The
> case provided in v6-0002 attachment reproduces the issue (provided by ChangAo
> Chen).

I agree that this is clearly a bug and should be fixed.

>
> # Code-level analysis
>
> SnapBuildCommitTxn does consider transaction processing during the
> BUILDING_SNAPSHOT state. However, it is only called from xact_decode ->
> DecodeCommit. xact_decode does not process any xact record when snapshot builder
> have not yet reached the FULL_SNAPSHOT state, meaning those commits are ignored.
> Similarly, other functions marking transaction having catalog changes (e.g.,
> heap2_decode) also do not handle records before reaching the FULL_SNAPSHOT
> state.

The analysis looks correct to me.

> # Possible fixes
>
> 1. Replace snapshot at the time we reach CONSISTENT state.
>
> Ajin Cherian in [4] and my initial thought was that although the snapshot at
> FULL_SNAPSHOT state might be wrong, the snapshot at CONSISTENT state is
> guaranteed to be correct. Since decoding always starts after reaching
> CONSISTENT state, we could update both the reorder buffer and the builder
> snapshot with the one captured at CONSISTENT state. However, IMUC, this would
> still cause changes generated before CONSISTENT to carry a wrong snapshot (see
> SnapBuildDistributeSnapshotAndInval).
>
> 2. Track transactions during BUILDING_SNAPSHOT state for snapshot builder If the
> builder does not track transactions in BUILDING_SNAPSHOT state, then we make it
> track them.
>
> 1) ChangAo Chen in v6-0001 attachment provided a fix, already reviewed by
> several people (including me). Bertrand Drouvot in [5] considered the logic a
> bit messy. And I prefer we should make the behavior of snapshot building similar
> in both BUILDING_SNAPSHOT and FULL_SNAPSHOT states, except in cases where a base
> snapshot is needed.
>
> 2) Based on v6-0001, I have provided a minimal fix in v6-0003 (not yet
> reviewed). AFAICS, it resolves the problem, though it records additional useless
> information in the reorder buffer during BUILDING_SNAPSHOT state (which is
> discarded later). This increases memory usage and slightly impacts performance.
> But since snapshot building is infrequent, I consider this acceptable.
>
> 3) I have also prepared a cleaner and more efficient fix in v6-0004 than
> v6-0003, albeit more complex (similar to v6-0001). Provided as an alternative
> reference.

I also thought about how to fix this issue, and my draft idea is similar to what
your v6-0004 implements, e.g., track transactions during BUILDING state but also
avoid unnecessary cost during decoding. So, I think it's on the right track.

Best Regard,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-01-27 02:58:01 Re: Is abort() still needed in WalSndShutdown()?
Previous Message Amit Langote 2026-01-27 02:39:00 Re: unnecessary executor overheads around seqscans