Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-11 01:49:50
Message-ID: CAH2-Wzk2g-muJ8ndNvgf9B=GsnSONRuW-0KQ9+ge-x5-NNyBmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 10, 2021 at 5:58 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> The problem with writing a test is likely to find a way to halfway
> reliably schedule a transaction abort after pruning, but before the
> tuple-removal loop? Does anybody see a trick to do so?

I asked Alexander about using his pending stop events infrastructure
patch to test this code, back when it did the tupgone stuff rather
than loop:

https://postgr.es/m/CAH2-Wz=Tb7bAgCFt0VFA0YJ5Vd1RxJqZRc

I can't see any better way.

ISTM that it would be much more useful to focus on adding an assertion
(or maybe even a "can't happen" error) that fails when the DEAD/goto
path is reached with a tuple whose xmin wasn't aborted. If that was in
place then we would have caught the bug in
GetOldestNonRemovableTransactionId() far sooner. That might actually
catch other bugs in the future.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-11 01:53:18 Re: Race condition in recovery?
Previous Message David Rowley 2021-06-11 01:44:40 Re: "an SQL" vs. "a SQL"