Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-16 16:03:29
Message-ID: CAH2-Wzm3tnHJExZ+fPgVJX7v1Vo7jWDFtN040zeYsGyjfjP7qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> On Tue, 15 Jun 2021 at 03:22, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state)
> > > static void
> > > GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
> > > {
> > > + /* assert non-decreasing nature of horizons */
> >
> > Thinking more about it, I don't think these are correct. See the
> > following comment in procarray.c:
> >
> > * Note: despite the above, it's possible for the calculated values to move
> > * backwards on repeated calls.
>
> So the implicit assumption in heap_page_prune that
> HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> we'll need to redo the condition in heap_page_prune as well.

I don't think that this shows that the assumption within
lazy_scan_prune() (the assumption that both "satisfies vacuum"
functions agree) is wrong, with the obvious exception of cases
involving the bug that Justin reported. GlobalVis*.maybe_needed is
supposed to be conservative.

> PFA my adapted patch that fixes this new-ish issue, and does not
> include the (incorrect) assertions in GlobalVisUpdateApply. I've
> tested this against the reproducing case, both with and without the
> fix in GetOldestNonRemovableTransactionId, and it fails fall into an
> infinite loop.
>
> I would appreciate it if someone could validate the new logic in the
> HEAPTUPLE_DEAD case. Although I believe it correctly handles the case
> where the vistest non-removable horizon moved backwards, a second pair
> of eyes would be appreciated.

If you look at the lazy_scan_prune() logic immediately prior to commit
8523492d4e3, you'll see that it used to have a HEAPTUPLE_DEAD case
that didn't involve a restart -- this was the "tupgone" mechanism.
Back then we actually had to remove any corresponding index tuples
from indexes when in this rare case. Plus there was a huge amount of
complicated mechanism to handle a very rare case, all of which was
removed by commit 8523492d4e3. Things like extra recovery conflict
code just for this rare case, or needing to acquire a super exclusive
lock on pages during VACUUM's second heap pass. This is all cruft that
I was happy to get rid of.

This is a good discussion of the tupgone stuff and the problems it
caused, which is good background information:

https://www.postgresql.org/message-id/20200724165514.dnu5hr4vvgkssf5p%40alap3.anarazel.de

Even if it was true that heap_prune_satisfies_vacuum() won't agree
with HeapTupleSatisfiesVacuum() after repeated retries within
lazy_scan_prune(), it would probably best if we then made code outside
vacuumlazy.c agree with the lazy_scan_prune() assumption, rather than
the other way around.

Have you actually been able to demonstrate a problem involving
lazy_scan_prune()'s goto, except the main
GetOldestNonRemovableTransactionId() bug reported by Justin?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-06-16 16:07:36 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Robert Haas 2021-06-16 16:01:09 Re: a path towards replacing GEQO with something better