Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()

From: ocean_li_996 <ocean_li_996(at)163(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
Date: 2024-02-28 07:57:37
Message-ID: 6d0e80d6.c1fc.18deeb8120a.Coremail.ocean_li_996@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

At 2024-02-28 15:53:30, "PG Bug reporting form" <noreply(at)postgresql(dot)org> wrote:

>1) The WAL records from restart_lsn to the corresponding lsn when the issue
>occurred,
>2) personal analysis of the problem,
>3) the steps to reproduce the issue,
>4) personal proposed solution
>will be posted later under this thread.

>

1) The WAL records from restart_lsn to the corresponding lsn when the issue occurred is supported in attachment file 1.

2) As indicated in 1), some invalidation messages are generated in 19933 top xact. After the decoding restarted, the invalidation messages will make 19933 top xact and its subtransaction(s) to be marked as containing catalog change while processing its commit record(see SnapBuildXidSetCatalogChanges() ). In this step, the corresponding subxacts which never procedded before are added into ReorderBuffer with the same first_lsn as top-level xact. Then, the check in AssertTXNLsnOrder() will failed if the number of subxact mentioned above more than 1.

3) The patch to reproduce the issue is supported in attachment file 2. DML on temporary table can consume xid and not log any WAL RECORD except it's the firtst subxact of top xact(log ASSIGNMENT record). So we use DML on temporary table to generate two "never procedded before" sunxacts in on top xact.

4) Since it is already known to be a subxact before being add into ReorderBuffer, I think an appropriate fix is extending the ReorderBufferXidSetCatalogChanges function with an is_top parameter to indicate whether the xact is a top-level xact.
For a subxact, it would not be added to the toplevel_by_lsn list and would not undergo the AssertTXNLsnOrder check. Of course, it is necessary to introduce a check to verify whether a node is in the list when attempting to remove a node from toplevel_by_lsn.
The specific fix patch is provided in Attachment 3.

Thanks
Haiyang Li

Attachment Content-Type Size
xid_19933_wal_record.txt text/plain 9.1 KB
v1-0001-Testcase-Coredump-On-AssertTXNLsnOrder.patch application/octet-stream 2.5 KB
v1-0002-Fix-Coredump-On-AssertTXNLsnOrder.patch application/octet-stream 4.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message ocean_li_996 2024-02-28 08:20:17 Re:Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
Previous Message PG Bug reporting form 2024-02-28 07:53:30 BUG #18369: logical decoding core on AssertTXNLsnOrder()