Re: Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: ocean_li_996 <ocean_li_996(at)163(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT
Date: 2026-01-29 03:33:59
Message-ID: CAFPTHDZeqasZ7_LV539=4P3ap2eR8S1DONS3p7wZ6jtgVqja5Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 29, 2026 at 6:07 AM ocean_li_996 <ocean_li_996(at)163(dot)com> wrote:
>
> Hi Ajin,
>
> At 2026-01-28 11:32:41, "Ajin Cherian" <itsajin(at)gmail(dot)com> wrote:
> >I agree with your analysis and approach, but when I tried out the
> >patch (applying patch 0002 for the tests and patch 0004), I see the
> >tests in contrib/test_decoding failing.
> >Similarly, applying patch 0002 and 0003 also results in the tests
> >failing. So, I am not sure how your minimal fix fixes the problem. Am
> >I doing something wrong?
> >Does patch 0003 and 0004 have to be applied on top of 0001? That
> >doesn't seem to be the case, as both make the same code change and
> >don't apply cleanly.
>
> 0002 patch is only a test case. And 0001, 0003 and 0004 are independt fix patch.
>
> I appied 0002 + 0003 and 0002 + 0004 separately in master. And both the tests in
>
> contrib/test_decoding were passed. Can you provide more details about the failed
>
> tests(such as which tests and the diff between expected and ressult).

Hi Haiyang,

I tested with patch 0002+0004 on HEAD, and the test added by patch
0002 is failing like below.

not ok 15 - snapshot_build 289 ms
# (test process exited with exit code 1)

I see the postgres crashed and when I look at the core file, I see the
below stack trace:
Core was generated by
`/home/ajin/postgresql/postgres3/postgres/tmp_install/home/ajin/install-oss/bin/postgres
'' '' '' '' '' '' '''.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007436092c7e9c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing rpms, try: dnf --enablerepo='*debug*' install
zlib-ng-compat-debuginfo-2.2.3-2.el10.x86_64
glibc-debuginfo-2.39-38.el10.x86_64
libicu-debuginfo-74.2-5.el10.x86_64
libstdc++-debuginfo-14.3.1-2.1.el10.x86_64
libgcc-debuginfo-14.3.1-2.1.el10.x86_64
(gdb) bt
#0 0x00007436092c7e9c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x0000743609271a96 in raise () from /lib64/libc.so.6
#2 0x00007436092598fa in abort () from /lib64/libc.so.6
#3 0x0000000000a00a93 in ExceptionalCondition
(conditionName=conditionName(at)entry=0xb1e039 "txn->ninvalidations ==
0", fileName=fileName(at)entry=0xb1dcd8 "reorderbuffer.c",
lineNumber=lineNumber(at)entry=3207) at assert.c:65
#4 0x0000000000836c94 in ReorderBufferForget (rb=0x3cf96a40,
xid=xid(at)entry=1136, lsn=34819568) at reorderbuffer.c:3207
#5 0x0000000000823aea in DecodeCommit (ctx=ctx(at)entry=0x3cf869d0,
buf=buf(at)entry=0x7ffd069389a0, parsed=parsed(at)entry=0x7ffd069387d0,
xid=xid(at)entry=1136, two_phase=false) at decode.c:707
#6 0x000000000082497c in xact_decode (ctx=ctx(at)entry=0x3cf869d0,
buf=buf(at)entry=0x7ffd069389a0) at decode.c:237
#7 0x00000000008246eb in LogicalDecodingProcessRecord
(ctx=ctx(at)entry=0x3cf869d0, record=0x3cf86da8) at decode.c:116
#8 0x0000000000829602 in DecodingContextFindStartpoint
(ctx=ctx(at)entry=0x3cf869d0) at logical.c:647
#9 0x000000000084e8ea in create_logical_replication_slot
(name=name(at)entry=0x3ce8fb70 "isolation_slot",
plugin=plugin(at)entry=0x3ce8fc10 "test_decoding",
temporary=temporary(at)entry=false, two_phase=two_phase(at)entry=false,
failover=failover(at)entry=false,
restart_lsn=restart_lsn(at)entry=0, find_startpoint=true) at slotfuncs.c:177
#10 0x000000000084f300 in pg_create_logical_replication_slot
(fcinfo=<optimized out>) at slotfuncs.c:207
#11 0x00000000006c77c4 in ExecMakeTableFunctionResult
(setexpr=0x3cf61b18, econtext=0x3cf61968, argContext=<optimized out>,
expectedDesc=0x3cf79f48, randomAccess=false) at execSRF.c:234
#12 0x00000000006da839 in FunctionNext (node=node(at)entry=0x3cf61758) at
nodeFunctionscan.c:94

(gdb) f 4
#4 0x0000000000836c94 in ReorderBufferForget (rb=0x3cf96a40,
xid=xid(at)entry=1136, lsn=34819568) at reorderbuffer.c:3207
3207 Assert(txn->ninvalidations == 0);
(gdb) p *txn
$1 = {txn_flags = 1, xid = 1136, toplevel_xid = 0, gid = 0x0,
first_lsn = 34814328, final_lsn = 34819568, end_lsn = 0, toptxn = 0x0,
restart_decoding_lsn = 0, origin_id = 0, origin_lsn = 0, {commit_time
= 0, prepare_time = 0, abort_time = 0}, base_snapshot = 0x0,
base_snapshot_lsn = 0, base_snapshot_node = {prev = 0x0, next =
0x0}, snapshot_now = 0x0, command_id = 4294967295, nentries = 14,
nentries_mem = 14, changes = {head = {prev = 0x3cfb5510, next =
0x3cfb4ae8}}, tuplecids = {head = {prev = 0x3cfb5440, next =
0x3cfb4a80}},
ntuplecids = 13, tuplecid_hash = 0x0, toast_hash = 0x0, subtxns =
{head = {prev = 0x3cfb2c68, next = 0x3cfb2c68}}, nsubtxns = 0,
ninvalidations = 22, invalidations = 0x3cf96c50,
ninvalidations_distributed = 0, invalidations_distributed = 0x0, node
= {prev = 0x3cfb2b30,
next = 0x3cf96a48}, catchange_node = {prev = 0x3cf96a68, next =
0x3cf96a68}, txn_node = {first_child = 0x0, next_sibling = 0x0,
prev_or_parent = 0x0}, size = 1472, total_size = 1472,
output_plugin_private = 0x0}

Looks like the assert in ReorderBufferForget failed because
ninvalidations is not 0.

You need to build with asserts enabled.

regards,
Ajin Cherian
Fujtitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Srirama Kucherlapati 2026-01-29 03:43:06 RE: AIX support
Previous Message Tom Lane 2026-01-29 02:41:18 Re: pg_plan_advice