logical decoding build wrong snapshot with subtransactions

From: feichanghong <feichanghong(at)qq(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: logical decoding build wrong snapshot with subtransactions
Date: 2024-01-19 07:35:24
Message-ID: tencent_0D9510DFCB32603A653F7C4D68389A113E09@qq.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This issue has been reported in the <pgsql-bugs&gt; list at the below link, but
received almost no response:
https://www.postgresql.org/message-id/18280-4c8060178cb41750%40postgresql.org
Hoping for some feedback from kernel hackers, thanks!

Hi, hackers,
I've encountered a problem with logical decoding history snapshots. The
specific error message is: "ERROR: could not map filenode "base/5/16390" to
relation OID".

If a subtransaction that modified the catalog ends before the
restart_lsn of the logical replication slot, and the commit WAL record of
its top transaction is after the restart_lsn, the WAL record related to the
subtransaction won't be decoded during logical decoding. Therefore, the
subtransaction won't be marked as having modified the catalog, resulting in
its absence from the snapshot's committed list.

The issue seems to be caused by SnapBuildXidSetCatalogChanges
(introduced in 272248a) skipping checks for subtransactions when the top
transaction is marked as containing catalog changes.

The following steps can reproduce the problem (I increased the value of
LOG_SNAPSHOT_INTERVAL_MS to avoid the impact of bgwriter writing
XLOG_RUNNING_XACTS WAL records):
session 1:
```
CREATE TABLE tbl1 (val1 integer, val2 integer);
CREATE TABLE tbl1_part (val1 integer) PARTITION BY RANGE (val1);

SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');

BEGIN;
SAVEPOINT sp1;
CREATE TABLE tbl1_part_p1 PARTITION OF tbl1_part FOR VALUES FROM (0) TO (10);
RELEASE SAVEPOINT sp1;
```

session 2:
```
CHECKPOINT;
```

session 1:
```
CREATE TABLE tbl1_part_p2 PARTITION OF tbl1_part FOR VALUES FROM (10) TO (20);
COMMIT;
BEGIN;
TRUNCATE tbl1;
```

session 2:
```
CHECKPOINT;
SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
INSERT INTO tbl1_part VALUES (1);
SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
```

To fix this issue, it is sufficient to remove the condition check for
ReorderBufferXidHasCatalogChanges in SnapBuildXidSetCatalogChanges.

This fix may add subtransactions that didn't change the catalog to the commit
list, which seems like a false positive. However, this is acceptable since
we only use the snapshot built during decoding to read system catalogs, as
stated in 272248a's commit message.

I have verified that the patch in the attachment resolves the issues
mentioned, and I added some test cases.

I am eager to hear your suggestions on this!

Best Regards,
Fei Changhong
Alibaba Cloud Computing Ltd.

&nbsp;

Attachment Content-Type Size
fix_wrong_snapshot_for_logical_decoding.patch application/octet-stream 5.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2024-01-19 07:51:20 Re: pgbnech: allow to cancel queries during benchmark
Previous Message John Naylor 2024-01-19 07:27:11 Re: Change GUC hashtable to use simplehash?