race condition in pg_class

From: Smolkin Grigory <smallkeen(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: race condition in pg_class
Date: 2023-10-25 10:39:41
Message-ID: CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers!
We are running PG13.10 and recently we have encountered what appears to be
a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and
some other catalog-writer, possibly ANALYZE.
The problem is that after successfully creating index on relation (which
previosly didnt have any indexes), its pg_class.relhasindex remains set to
"false", which is illegal, I think.
Index was built using the following statement:
ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);

PG_CLASS:
# select ctid,oid,xmin,xmax, * from pg_class where oid = 3566558198;
-[ RECORD 1
]-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ctid | (12,49)
oid | 3566558198
xmin | 1202298791
xmax | 0
relname | example
relnamespace | 16479
reltype | 3566558200
reloftype | 0
relowner | 16386
relam | 2
relfilenode | 3566558198
reltablespace | 0
relpages | 152251
reltuples | 1.1565544e+07
relallvisible | 127837
reltoastrelid | 3566558203
relhasindex | f
relisshared | f
relpersistence | p
relkind | r
relnatts | 12
relchecks | 0
relhasrules | f
relhastriggers | f
relhassubclass | f
relrowsecurity | f
relforcerowsecurity | f
relispopulated | t
relreplident | d
relispartition | f
relrewrite | 0
relfrozenxid | 1201647807
relminmxid | 1
relacl |
reloptions |
relpartbound |

PG_INDEX:
# select ctid,xmin,xmax,indexrelid::regclass,indrelid::regclass, * from
pg_index where indexrelid = 3569625749;
-[ RECORD 1 ]--+---------------------------------------------
ctid | (3,30)
xmin | 1202295045
xmax | 0
indexrelid | "example_pkey"
indrelid | "example"
indexrelid | 3569625749
indrelid | 3566558198
indnatts | 1
indnkeyatts | 1
indisunique | t
indisprimary | t
indisexclusion | f
indimmediate | t
indisclustered | f
indisvalid | t
indcheckxmin | f
indisready | t
indislive | t
indisreplident | f
indkey | 1
indcollation | 0
indclass | 3124
indoption | 0
indexprs |
indpred |

Looking into the WAL via waldump given us the following picture (full
waldump output is attached):

tx: 1202295045, lsn: AAB1/D38378D0, prev AAB1/D3837208, desc: FPI , blkref
#0: rel 1663/16387/3569625749 blk 0 FPW
tx: 1202298790, lsn: AAB1/D3912EC0, prev AAB1/D3912E80, desc: NEW_CID rel
1663/16387/1259; tid 6/24; cmin: 0, cmax: 4294967295, combo: 4294967295
tx: 1202298790, lsn: AAB1/D3927580, prev AAB1/D3926988, desc: COMMIT
2023-10-04 22:41:23.863979 UTC
tx: 1202298791, lsn: AAB1/D393C230, prev AAB1/D393C1F0, desc: HOT_UPDATE
off 24 xmax 1202298791 flags 0x20 ; new off 45 xmax 0, blkref #0: rel
1663/16387/1259 blk 6
tx: 1202298791, lsn: AAB1/D394ADA0, prev AAB1/D394AD60, desc: UPDATE off 45
xmax 1202298791 flags 0x00 ; new off 28 xmax 0, blkref #0: rel
1663/16387/1259 blk 5, blkref #1: rel 1663/16387/1259 blk 6
tx: 1202298791, lsn: AAB1/D3961088, prev AAB1/D3961048, desc: NEW_CID rel
1663/16387/1259; tid 12/49; cmin: 50, cmax: 4294967295, combo: 4294967295
tx: 1202295045, lsn: AAB1/D3962E78, prev AAB1/D3962E28, desc: INPLACE off
24, blkref #0: rel 1663/16387/1259 blk 6
tx: 1202295045, lsn: AAB1/D39632A0, prev AAB1/D3963250, desc: COMMIT
2023-10-04 22:41:23.878565 UTC
tx: 1202298791, lsn: AAB1/D3973420, prev AAB1/D39733D0, desc: COMMIT
2023-10-04 22:41:23.884951 UTC

1202295045 - create index statement
1202298790 and 1202298791 are some other concurrent operations,
unfortunately I wasnt able to determine what are they

So looks like 1202295045, updated tuple (6,24) in pg_class INPLACE, in
which at this point xmax was already set by 1202298791 and new tuple in
(12,49) was in created.
So after 1202298791 was commited, that inplace update was effectively lost.
If we do an inclusive PITR with (recovery_target_xid = 1202295045), we can
see the following picture (notice relhasindex and xmax):

# select ctid,oid, xmin,xmax,relhasindex,cmin,cmax from pg_class where oid
= 3566558198;
-[ RECORD 1 ]-----------
ctid | (6,24)
oid | 3566558198
xmin | 1202298790
xmax | 1202298791
relhasindex | t
cmin | 0
cmax | 0

I've tried to reproduce this scenario with CREATE INDEX and various
concurrent statements, but no luck.
Attached full waldump output for the relevant WAL segment.

Attachment Content-Type Size
waldump.txt text/plain 246.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-10-25 10:42:25 Re: CDC/ETL system on top of logical replication with pgoutput, custom client
Previous Message Jelte Fennema 2023-10-25 10:39:01 Re: Add new for_each macros for iterating over a List that do not require ListCell pointer