Re: Combine Prune and Freeze records emitted by vacuum

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-03-28 08:49:41
Message-ID: b088c6f4-df97-48ee-aa88-f9a5f9dc6d29@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28/03/2024 03:53, Melanie Plageman wrote:
> On Thu, Mar 28, 2024 at 01:04:04AM +0200, Heikki Linnakangas wrote:
>> One change with this is that live_tuples and many of the other fields are
>> now again updated, even if the caller doesn't need them. It was hard to skip
>> them in a way that would save any cycles, with the other refactorings.
>
> I am worried we are writing checks that are going to have to come out of
> SELECT queries' bank accounts, but I'll do some benchmarking when we're
> all done with major refactoring.

Sounds good, thanks.

>> + *
>> + * Whether we arrive at the dead HOT tuple first here or while
>> + * following a chain below affects whether preceding RECENTLY_DEAD
>> + * tuples in the chain can be removed or not. Imagine that you
>> + * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
>> + * reach the RECENTLY_DEAD tuple first, the chain-following logic
>> + * will find the DEAD tuple and conclude that both tuples are in
>> + * fact dead and can be removed. But if we reach the DEAD tuple
>> + * at the end of the chain first, when we reach the RECENTLY_DEAD
>> + * tuple later, we will not follow the chain because the DEAD
>> + * TUPLE is already 'marked', and will not remove the
>> + * RECENTLY_DEAD tuple. This is not a correctness issue, and the
>> + * RECENTLY_DEAD tuple will be removed by a later VACUUM.
>> */
>> if (!HeapTupleHeaderIsHotUpdated(htup))
>
> Is this intentional? Like would it be correct to remove the
> RECENTLY_DEAD tuple during the current vacuum?

Yes, it would be correct. And if we happen to visit the items in
different order, the RECENTLY_DEAD tuple first, it will get removed.

(This is just based on me reading the code, I'm not sure if I've missed
something. Would be nice to construct a test case for that and step
through it with a debugger to really see what happens. But this is not a
new issue, doesn't need to be part of this patch)

>> From c2ee7508456d0e76de985f9997a6840450e342a8 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
>> Date: Thu, 28 Mar 2024 00:45:26 +0200
>> Subject: [PATCH v8 22/22] WIP
>>
>> - Got rid of all_visible_except_removable again. We're back to the
>> roughly the same mechanism as on 'master', where the all_visible
>> doesn't include LP_DEAD items, but at the end of
>> heap_page_prune_and_freeze() when we return to the caller, we clear
>> it if there were any LP_DEAD items. I considered calling the
>> variable 'all_visible_except_lp_dead', which would be more accurate,
>> but it's also very long.
>
> not longer than all_visible_except_removable. I would be happy to keep
> it more exact, but I'm also okay with just all_visible.

Ok, let's make it 'all_visible_except_lp_dead' for clarity.

> What's this "cutoffs TODO"?
>
>> + * cutoffs TODO
>> + *
>> * presult contains output parameters needed by callers such as the number of
>> * tuples removed and the number of line pointers newly marked LP_DEAD.
>> * heap_page_prune_and_freeze() is responsible for initializing it.

All the other arguments are documented in the comment, except 'cutoffs'.

>> @@ -728,10 +832,12 @@ htsv_get_valid_status(int status)
>> * DEAD, our visibility test is just too coarse to detect it.
>> *
>> * In general, pruning must never leave behind a DEAD tuple that still has
>> - * tuple storage. VACUUM isn't prepared to deal with that case. That's why
>> + * tuple storage. VACUUM isn't prepared to deal with that case (FIXME: it no longer cares, right?).
>> + * That's why
>> * VACUUM prunes the same heap page a second time (without dropping its lock
>> * in the interim) when it sees a newly DEAD tuple that we initially saw as
>> - * in-progress. Retrying pruning like this can only happen when an inserting
>> + * in-progress (FIXME: Really? Does it still do that?).
>
> Yea, I think all that is no longer true. I missed this comment back
> then.

Committed a patch to remove it.

>> @@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
>> * RECENTLY_DEAD tuples just in case there's a DEAD one after them;
>> * but we can't advance past anything else. We have to make sure that
>> * we don't miss any DEAD tuples, since DEAD tuples that still have
>> - * tuple storage after pruning will confuse VACUUM.
>> + * tuple storage after pruning will confuse VACUUM (FIXME: not anymore
>> + * I think?).
>
> Meaning, it won't confuse vacuum anymore or there won't be DEAD tuples
> with storage after pruning anymore?

I meant that it won't confuse VACUUM anymore. lazy_scan_prune() doesn't
loop through the items on the page checking their visibility anymore.

Hmm, one confusion remains though: In the 2nd phase of vacuum, we remove
all the dead line pointers that we have now removed from the indexes.
When we do that, we assume them all to be dead line pointers, without
storage, rather than normal tuples that happen to be HEAPTUPLE_DEAD. So
it's important that if pruning would leave behind HEAPTUPLE_DEAD tuples,
they are not included in 'deadoffsets'.

In any case, let's just make sure that pruning doesn't leave
HEAPTUPLE_DEAD tuples. There's no reason it should.

>> @@ -1224,10 +1327,21 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
>> * ensure the math works out. The assumption that the transaction will
>> * commit and update the counters after we report is a bit shaky; but it
>> * is what acquire_sample_rows() does, so we do the same to be consistent.
>> + *
>> + * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
>> + * subroutines. They don't count dead items like acquire_sample_rows()
>> + * does, because we assume that all dead items will become LP_UNUSED
>> + * before VACUUM finishes. This difference is only superficial. VACUUM
>> + * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
>> + * won't remember LP_DEAD items, but only because they're not supposed to
>> + * be left behind when it is done. (Cases where we bypass index vacuuming
>> + * will violate this optimistic assumption, but the overall impact of that
>> + * should be negligible.) FIXME: I don't understand that last sentence in
>> + * parens. It was copied from elsewhere.
>
> If we bypass index vacuuming, there will be LP_DEAD items left behind
> when we are done because we didn't do index vacuuming and then reaping
> of those dead items. All of these comments are kind of a copypasta,
> though.

Ah, gotcha, makes sense now. I didn't remember that we sometimes by pass
index vacuuming if there are very few dead items. I thought this talked
about the case that there are no indexes, but that case is OK.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-03-28 09:15:28 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Tender Wang 2024-03-28 08:32:41 Re: Can't find not null constraint, but \d+ shows that