Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-02-20 03:01:28
Message-ID: 20220220030128.sgytb3wccteb3opj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-19 18:16:54 -0800, Peter Geoghegan wrote:
> On Sat, Feb 19, 2022 at 5:54 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > How does that cause the endless loop?
>
> Attached is the page image itself, dumped via gdb (and gzip'd). This
> was on recent HEAD (commit 8f388f6f, actually), plus
> 0001-Add-adversarial-ConditionalLockBuff[...]. No other changes. No
> defragmenting in pg_surgery, nothing like that.

> > It doesn't do so on HEAD + 0001-Add-adversarial-ConditionalLockBuff[...] for
> > me. So something needs have changed with your patch?
>
> It doesn't always happen -- only about half the time on my machine.
> Maybe it's timing sensitive?

Ah, I'd only run the tests three times or so, without it happening. Trying a
few more times repro'd it.

It's kind of surprising that this needs this
0001-Add-adversarial-ConditionalLockBuff to break. I suspect it's a question
of hint bits changing due to lazy_scan_noprune(), which then makes
HeapTupleHeaderIsHotUpdated() have a different return value, preventing the
"If the tuple is DEAD and doesn't chain to anything else"
path from being taken.

> We hit the "goto retry" on offnum 2, which is the first tuple with
> storage (you can see "the ghost" of the tuple from the LP_DEAD item at
> offnum 1, since the page isn't defragmented in pg_surgery). I think
> that this happens because the heap-only tuple at offnum 2 is fully
> DEAD to lazy_scan_prune, but hasn't been recognized as such by
> heap_page_prune. There is no way that they'll ever "agree" on the
> tuple being DEAD right now, because pruning still doesn't assume that
> an orphaned heap-only tuple is fully DEAD.

> We can either do that, or we can throw an error concerning corruption
> when heap_page_prune notices orphaned tuples. Neither seems
> particularly appealing. But it definitely makes no sense to allow
> lazy_scan_prune to spin in a futile attempt to reach agreement with
> heap_page_prune about a DEAD tuple really being DEAD.

Yea, this sucks. I think we should go for the rewrite of the
heap_prune_chain() logic. The current approach is just never going to be
robust.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-02-20 03:07:39 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message Peter Geoghegan 2022-02-20 02:16:54 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations