| From: | ocean_li_996 <ocean_li_996(at)163(dot)com> |
|---|---|
| To: | "Haiyang Li" <mohen(dot)lhy(at)alibaba-inc(dot)com> |
| Cc: | "Michael Paquier" <michael(at)paquier(dot)xyz>, "Xuneng Zhou" <xunengzhou(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: BUG #19109: catalog lookup with the wrong snapshot during logical decoding causes coredump |
| Date: | 2025-11-17 01:34:03 |
| Message-ID: | 2b9e5ac8.136f.19a8f7297ee.Coremail.ocean_li_996@163.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi all,
After further analysis, I think that if the slot's confirm_lsn is less than the LSN where
the consistent snapshot is built, the result should only be potential data loss, and should
not cause unexpected core dumps or errors.
From my investigation, the root cause seems to be that transactions committed after we reach
the consistent snapshot are using a wrong catalog snapshot.
Although I reproduced the issue manually by deleting the .snap file, I believe the same problem
could occur in a failover slot scenario, where a hot standby might encounter a similar situation.
Modified reproduction steps:
```
-- Please extend the checkpoint and background writer intervals to avoid additional RUNNING_XACT WAL
-- being generated during testing. I have done this by modifying the LOG_SNAPSHOT_INTERVAL_MS
-- and CheckPointTimeout GUC settings.
-- s1(session 1)
CREATE TABLE t1 (
id integer NOT NULL,
unique1 integer,
hundred numeric,
tenthous numeric)
WITH (toast_tuple_target='128');
ALTER TABLE ONLY t1 ALTER COLUMN hundred SET STORAGE EXTENDED;
ALTER TABLE ONLY t1 REPLICA IDENTITY FULL;
CREATE TABLE t2 (
id integer NOT NULL,
unique1 integer,
hundred numeric,
tenthous numeric)
WITH (toast_tuple_target='128');
ALTER TABLE ONLY t2 ALTER COLUMN hundred SET STORAGE EXTENDED;
ALTER TABLE ONLY t2 REPLICA IDENTITY FULL;
select pg_create_logical_replication_slot('test', 'test_decoding');
begin;
insert into t1 values (1,1,1,1);
select pg_log_standby_snapshot();
commit;
begin;
insert into t2 values (1,1,1,1);
-- s2
ALTER TABLE IF EXISTS t1 ALTER COLUMN unique1 SET DATA TYPE NUMERIC;
-- s1
select pg_log_standby_snapshot();
commit;
select pg_logical_slot_get_changes('test', NULL, NULL);
begin;
insert into t2 values (1,1,1,1);
select pg_log_standby_snapshot();
commit;
--s2
insert into t1 values (1,1,1,1);
\q
-- remove saved snapshot
rm -rf {data_directory}/pg_logical/snapshots/*.snap
-- s1
-- error
select pg_logical_slot_peek_changes('test', NULL, NULL);
```
Analysis:
```
WAL logs obtained:
-- RUNNING_XACTS
rmgr: Standby len (rec/tot): 54/ 54, tx: 827, lsn: 0/40C9A418, prev 0/40C9A3C8, desc: RUNNING_XACTS nextXid 828 latestCompletedXid 826 oldestRunningXid 827; 1 xacts: 827
rmgr: Standby len (rec/tot): 54/ 54, tx: 828, lsn: 0/40CA0910, prev 0/40C9FEF0, desc: RUNNING_XACTS nextXid 830 latestCompletedXid 829 oldestRunningXid 828; 1 xacts: 828
rmgr: Standby len (rec/tot): 54/ 54, tx: 830, lsn: 0/40CA09C8, prev 0/40CA0978, desc: RUNNING_XACTS nextXid 831 latestCompletedXid 829 oldestRunningXid 830; 1 xacts: 830
-- error xid WAL
rmgr: Heap len (rec/tot): 74/ 74, tx: 831, lsn: 0/40CA0A30, prev 0/40CA0A00, desc: INSERT off: 2, flags: 0x08, blkref #0: rel 1663/5/16484 blk 0
rmgr: Transaction len (rec/tot): 46/ 46, tx: 831, lsn: 0/40CA0A80, prev 0/40CA0A30, desc: COMMIT 2025-11-16 17:15:31.778871 UTC
When parsing the xid 831 commit record, changes are replayed. At this point, the transaction's snapshot_now is:
$1 = {snapshot_type = SNAPSHOT_HISTORIC_MVCC, xmin = 828, xmax = 828, xip = 0x29631e8, xcnt = 0, subxip = 0x0, subxcnt = 0, suboverflowed = false,
takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0, vistest = 0x0, active_count = 2, regd_count = 0, ph_node = {first_child = 0x0,
next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0, snapXactCompletionCount = 0, polar_snapshot_csn = 0, polar_csn_xid_snapshot = false}
```
We can see that snapshot_now has xmin = 828 and contains no committed transactions,
so xid 829 is considered invisible, which triggers an error. However, after the snapshot builder
reaches the consistent state, builder->xmin = 830, and xid 829 is already known to be committed.
The root cause is that after the snapshot builder reaches the consistent state, it does not refresh
builder->snapshot, which may leads to subsequent transactions obtaining an incorrect catalog snapshot.
Possible fix suggestion:
When the snapshot builder reaches the consistent state, rebuild builder->snapshot and propagate
the updated snapshot to the transactions in the reorder buffer.
Looking forward to your review and feedback.
--
regards,
Haiyang Li
| From | Date | Subject | |
|---|---|---|---|
| Next Message | PG Bug reporting form | 2025-11-17 06:08:30 | BUG #19114: ORDER BY ASC is tampering result when calculating distance btw vectors |
| Previous Message | Dean Rasheed | 2025-11-16 22:41:52 | Re: BUG #19111: Using EXPLAIN ANALYZE with MERGE causes failed assert |