Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-08 22:52:02
Message-ID: CAH2-WznvaKsC6-Z_jf3Y9CbNyk-rOY6Lfx+sJPmqebFg41nT2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 8, 2021 at 4:03 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page'));
> lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
> ----+--------+----------+--------+-----------+-----------+----------+--------+-------------+------------+--------+----------------------------------+-------
> 1 | 1320 | 1 | 259 | 926025112 | 0 | 0 | (1,1) | 32799 | 10499 | 32 | 11111111111111111111111000100000 |

*** SNIP ***

> 6 | 7464 | 1 | 259 | 926014884 | 926025112 | 0 | (1,1) | 49183 | 9475 | 32 | 11111111111111111111111000100000 |

As I understand it from your remarks + gdb output from earlier [1],
the tuple at offset number 6 is the tuple that triggers the suspicious
"goto restart" here. There was a regular UPDATE (not a HOT UPDATE)
that produced a successor version on the same heap page -- which is lp
1. Here are the t_infomask details for both tuples:

lp 6: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_COMMITTED|HEAP_UPDATED
<-- points to (1,1)
lp 1: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_INVALID|HEAP_UPDATED
<-- This is (1,1)

So if lp 1's xmin and lp 6's xmax XID/Xact committed (i.e., if XID
926025112 committed), why shouldn't HeapTupleSatisfiesVacuum() think
that lp 6 is DEAD (and not RECENTLY_DEAD)? You also say that
vacuumlazy.c's OldestXmin is 926025113, so it is hard to fault HTSV
here. The only way it could be wrong is if the hint bits were somehow
spuriously set, which seems unlikely to me.

[1] https://postgr.es/m/20210608113333.GC16435@telsasoft.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-08 22:55:46 Re: logical replication of truncate command with trigger causes Assert
Previous Message Mark Dilger 2021-06-08 21:52:14 logical replication of truncate command with trigger causes Assert