Re: Combine Prune and Freeze records emitted by vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-03-30 16:10:12
Message-ID: CAAKRu_abm2tHhrc0QSQa==sHe=VA1=oz1dJMQYUOKuHmu+9Xrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 30, 2024 at 8:00 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sat, Mar 30, 2024 at 1:57 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I think that we are actually successfully removing more RECENTLY_DEAD
> > HOT tuples than in master with heap_page_prune()'s new approach, and I
> > think it is correct; but let me know if I am missing something.
>
> /me blinks.
>
> Isn't zero the only correct number of RECENTLY_DEAD tuples to remove?

At the top of the comment for heap_prune_chain() in master, it says

* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
* chain. We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
* This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really
* DEAD, our visibility test is just too coarse to detect it.

Heikki had added a comment in one of his patches to the fast path for
HOT tuples at the top of heap_prune_chain():

* Note that we might first arrive at a dead heap-only tuple
* either while following a chain or here (in the fast
path). Whichever path
* gets there first will mark the tuple unused.
*
* Whether we arrive at the dead HOT tuple first here or while
* following a chain above affects whether preceding RECENTLY_DEAD
* tuples in the chain can be removed or not. Imagine that you
* have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
* reach the RECENTLY_DEAD tuple first, the chain-following logic
* will find the DEAD tuple and conclude that both tuples are in
* fact dead and can be removed. But if we reach the DEAD tuple
* at the end of the chain first, when we reach the RECENTLY_DEAD
* tuple later, we will not follow the chain because the DEAD
* TUPLE is already 'marked', and will not remove the
* RECENTLY_DEAD tuple. This is not a correctness issue, and the
* RECENTLY_DEAD tuple will be removed by a later VACUUM.

My patch splits the tuples into HOT and non-HOT while gathering their
visibility information and first calls heap_prune_chain() on the
non-HOT tuples and then processes the yet unmarked HOT tuples in a
separate loop afterward. This will follow all of the chains and
process them completely as well as processing all HOT tuples which may
not be reachable from a valid chain. The fast path contains a special
check to ensure that line pointers for DEAD not HOT-updated HOT tuples
(dead orphaned tuples from aborted HOT updates) are still marked
LP_UNUSED even though they are not reachable from a valid HOT chain.
By doing this later, we don't preclude ourselves from following all
chains.

- Melanie

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kartyshov Ivan 2024-03-30 16:14:14 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Dean Rasheed 2024-03-30 15:31:47 Re: Adding OLD/NEW support to RETURNING