clog segment truncation

From: Smolkin Grigory <smallkeen(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: clog segment truncation
Date: 2025-11-10 12:29:32
Message-ID: CAMp+ueZkG+HTWd0tajG4nDU_Vk51+wkngtuq7XjOjEBEW7t8Gw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello, hackers!

I`m currently investigating the case of one of our PostgreSQL instance
having started issuing error messages about missing clog segment:
ERROR: could not access status of transaction 1550558894
DETAIL: Could not open file "pg_xact/05C6": No such file or directory.

After examining the WAL records just prior the first sighing of these
errors, it became apparent, that clog segment was truncated by autovacuum
rmgr: CLOG len (rec/tot): 38/ 38, tx: 0, lsn:
81A/87C899E0, prev 81A/87C895A0, desc: TRUNCATE page 48264; oldestXact
1581542039

After looking closely at the affected heap pages (there are scores of them)
with tuples, created by 1550558894 transaction, and comparing them with
pg_visibility state and pg_class.relfrozenxig, it became apparent, that
there is may be something awry with either setting all_frozen bit in _vm,
or oldestXact calculation by vacuum.

According to pg_visibility, all_frozen bit is set:
wms-svc-inspecting=# select * from pg_visibility('affected_relation',
5456224)
wms-svc-inspecting-# ;
all_visible | all_frozen | pd_all_visible
-------------+------------+----------------
t | t | f
(1 row)

So all tuples on this page should be marked with HEAP_XMIN_FROZEN, but that
is not the case

SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid,
t_infomask2, t_infomask, t_hoff, t_bits,
heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM
heap_page_items(get_raw_page('affected_relation', 5456224)) ;

-[ RECORD 1
]-------------+-----------------------------------------------------------------------------
lp | 1
lp_off | 7976
lp_flags | 1
lp_len | 216
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,1)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111011100000
heap_tuple_infomask_flags |
("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 2
]-------------+-----------------------------------------------------------------------------
lp | 2
lp_off | 7768
lp_flags | 1
lp_len | 208
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,2)
t_infomask2 | 12
t_infomask | 2307
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags |
("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}",{})
-[ RECORD 3
]-------------+-----------------------------------------------------------------------------
lp | 3
lp_off | 7520
lp_flags | 1
lp_len | 248
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,3)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags |
("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 4
]-------------+-----------------------------------------------------------------------------
....
-[ RECORD 23
]------------+-----------------------------------------------------------------------------
lp | 23
lp_off | 272
lp_flags | 1
lp_len | 312
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,23)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags |
("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})

Also pg_class.relfrozenxig contains xid which exceed 1550558894:
# select relname, relfrozenxid from pg_class where oid =
'affected_relation'::regclass::oid;
-[ RECORD 1 ]+-----------
relname | affected_relation
relfrozenxid | 1609822074

So is it possible that either all_frozen bit was set incorrectly or wansn't
unset when tuples were created?
I will be happy to provide any additional information if required.
PostgreSQL version: 15.6, data_checksums: on

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2025-11-10 12:38:51 Re: BUG #19108: Stack overflow duting query parse
Previous Message Heikki Linnakangas 2025-11-10 11:43:03 Re: ecpg Fetch issue