Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-16 19:22:02
Message-ID: 20210616192202.6q63mu66h4uyn343@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote:
> On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
> > > So the implicit assumption in heap_page_prune that
> > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> > > we'll need to redo the condition in heap_page_prune as well.
> >
> > I don't think that this shows that the assumption within
> > lazy_scan_prune() (the assumption that both "satisfies vacuum"
> > functions agree) is wrong, with the obvious exception of cases
> > involving the bug that Justin reported. GlobalVis*.maybe_needed is
> > supposed to be conservative.
>
> I suppose it's true that they can disagree because we call
> vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c
> before calling GlobalVisTestFor() inside vacuumlazy.c to get a
> vistest. But that only implies that a tuple that would have been
> considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed
> being considered DEAD according to OldestXmin) is seen as an LP_DEAD
> stub line pointer. Which really means it's DEAD to lazy_scan_prune()
> anyway. These days the only way that lazy_scan_prune() can consider a
> tuple fully DEAD is if it's no longer a tuple -- it has to actually be
> an LP_DEAD stub line pointer.

I think it's more complicated than that - "before" isn't a guarantee when the
horizon can go backwards. Consider the case where a hot_standby_feedback=on
replica without a slot connects - that can result in the xid horizon going
backwards.

I think a good way to address this might be to have GlobalVisUpdateApply()
ensure that maybe_needed does not go backwards within one backend.

This is *nearly* already guaranteed within vacuum, except for the case where a
catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could
lead to an attempt at pruning, which could move maybe_needed to go backwards
theoretically if inbetween those two steps a replica connected that causes the
horizon to go backwards.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2021-06-16 19:23:06 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Previous Message Peter Geoghegan 2021-06-16 19:18:20 Re: snapshot too old issues, first around wraparound and then more.