Re: Eliminate redundant tuple visibility check in vacuum

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Date: 2023-09-01 08:47:10
Message-ID: ed6011e9-551f-6156-25c7-7e5ff068fcf4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 9/1/23 03:25, Peter Geoghegan wrote:
> On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>> Any inserting transaction which aborts after heap_page_prune()'s
>> visibility check will now be of no concern to lazy_scan_prune(). Since
>> we don't do the visibility check again, we won't find the tuple
>> HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
>> the array of tuples for vacuum to reap. This does mean that we won't
>> reap and remove tuples whose inserting transactions abort right after
>> heap_page_prune()'s visibility check. But, this doesn't seem like an
>> issue.
> It's definitely not an issue.
>
> The loop is essentially a hacky way of getting a consistent picture of
> which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
> need to be left behind (consistent at the level of each page and each
> HOT chain, at least). Making that explicit seems strictly better.
>
>> They may not be removed until the next vacuum, but ISTM it is
>> actually worse to pay the cost of re-pruning the whole page just to
>> clean up that one tuple. Maybe if that renders the page all visible
>> and we can mark it as such in the visibility map -- but that seems
>> like a relatively narrow use case.
> The chances of actually hitting the retry are microscopic anyway. It
> has nothing to do with making sure that dead tuples from aborted
> tuples get removed for its own sake, or anything. Rather, the retry is
> all about making sure that all TIDs that get removed from indexes can
> only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
> tuples with storage would very occasionally be left behind, which made
> life difficult in a bunch of other places -- for no good reason.
>
That makes sense and seems like a much better compromise. Thanks for the
explanations. Please update the comment to document the corner case and
how we handle it.

--
David Geier
(ServiceNow)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-09-01 09:15:48 Re: persist logical slots to disk during shutdown checkpoint
Previous Message Peter Smith 2023-09-01 08:29:06 Re: Synchronizing slots from primary to standby