Re: Combine Prune and Freeze records emitted by vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-03-27 15:18:46
Message-ID: CAAKRu_asepeJ0X9Wpt52kfwxLfGSTc5Pr1HJE_zVkvqXPV0chw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 26, 2024 at 5:46 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Mon, Mar 25, 2024 at 09:33:38PM +0200, Heikki Linnakangas wrote:
> > On 24/03/2024 18:32, Melanie Plageman wrote:
> > > On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > > >
> > > > In heap_page_prune_and_freeze(), we now do some extra work on each live
> > > > tuple, to set the all_visible_except_removable correctly. And also to
> > > > update live_tuples, recently_dead_tuples and hastup. When we're not
> > > > freezing, that's a waste of cycles, the caller doesn't care. I hope it's
> > > > enough that it doesn't matter, but is it?
> > >
> > > Last year on an early version of the patch set I did some pgbench
> > > tpcb-like benchmarks -- since there is a lot of on-access pruning in
> > > that workload -- and I don't remember it being a showstopper. The code
> > > has changed a fair bit since then. However, I think it might be safer
> > > to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
> > > the rest of the loop after heap_prune_satisifies_vacuum() when
> > > on-access pruning invokes it. I had avoided that because it felt ugly
> > > and error-prone, however it addresses a few other of your points as
> > > well.
> >
> > Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the
> > argument described what it does, rather than who it's for. For example,
> > 'need_all_visible'. If set to true, the function determines 'all_visible',
> > otherwise it does not.
>
> A very rough v7 is attached. The whole thing is rebased over master and
> then 0016 contains an attempt at the refactor we discussed in this
> email.
>
> Instead of just using the PruneReason to avoid doing the extra steps
> when on-access pruning calls heap_page_prune_and_freeze(), I've made an
> "actions" variable and defined different flags for it. One of them is
> a replacement for the existing mark_unused_now flag. I defined another
> one, PRUNE_DO_TRY_FREEZE, which could be used in place of checking if
> pagefrz is NULL.
>
> There is a whole group of activities that only the vacuum caller does
> outside of freezing -- setting hastup, counting live and recently dead
> tuples, determining whole page visibility and a snapshot conflict
> horizon for updating the VM. But I didn't want to introduce separate
> flags for each of them, because then I would have to check each of them
> before taking the action. That would be lots of extra branching and
> on-access pruning does none of those actions while vacuum does all of
> them.
>
> > I started to look closer at the loops in heap_prune_chain() and how they
> > update all the various flags and counters. There's a lot going on there. We
> > have:
> >
> > - live_tuples counter
> > - recently_dead_tuples counter
> > - all_visible[_except_removable]
> > - all_frozen
> > - visibility_cutoff_xid
> > - hastup
> > - prstate.frozen array
> > - nnewlpdead
> > - deadoffsets array
> >
> > And that doesn't even include all the local variables and the final
> > dead/redirected arrays.
> >
> > Some of those are set in the first loop that initializes 'htsv' for each
> > tuple on the page. Others are updated in heap_prune_chain(). Some are
> > updated in both. It's hard to follow which are set where.
> >
> > I think recently_dead_tuples is updated incorrectly, for tuples that are
> > part of a completely dead HOT chain. For example, imagine a hot chain with
> > two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow the
> > chain, see the DEAD tuple at the end of the chain, and mark both tuples for
> > pruning. However, we already updated 'recently_dead_tuples' in the first
> > loop, which is wrong if we remove the tuple.
>
> Ah, yes, you are so right about this bug.
>
> > Maybe that's the only bug like this, but I'm a little scared. Is there
> > something we could do to make this simpler? Maybe move all the new work that
> > we added to the first loop, into heap_prune_chain() ? Maybe introduce a few
> > more helper heap_prune_record_*() functions, to update the flags and
> > counters also for live and insert/delete-in-progress tuples and for dead
> > line pointers? Something like heap_prune_record_live() and
> > heap_prune_record_lp_dead().
>
> I like the idea of a heap_prune_record_live_or_recently_dead() function.
> That's what I've attempted to implement in the attached 0016. I haven't
> updated and cleaned up everything (especially comments) in the refactor,
> but there are two major issues:
>
> 1) In heap_prune_chain(), a heap-only tuple which is not HOT updated may
> end up being a live tuple not part of any chain or it may end up the
> redirect target in a HOT chain. At the top of heap_prune_chain(), we
> return if (HeapTupleHeaderIsHeapOnly(htup)). We may come back to this
> tuple later if it is part of a chain. If we don't, we need to have
> called heap_prune_record_live_or_recently_dead(). However, there are
> other tuples that get redirected to which do not meet this criteria, so
> we must call heap_prune_record_live_or_recently_dead() when setting an
> item redirected to. If we call heap_prune_record_live_or_recently_dead()
> in both places, we will double-count. To fix this, I introduced an
> array, "counted". But that takes up extra space in the PruneState and
> extra cycles to memset it.
>
> I can't think of a way to make sure we count the right tuples without
> another array. The tuples we need to count are those not pointed to by
> prstate->marked + those tuples whose line pointers will be redirected to
> (those are marked).
>
> 2) A large number of the members of PruneFreezeResult are only
> initialized for the vacuum caller now. Even with a comment, this is a
> bit confusing. And, it seems like there should be some symmetry between
> the actions the caller tells heap_page_prune_and_freeze() to take and
> the result parameters that are filled in.
>
> I am concerned about adding all of the actions (setting hastup,
> determining whole page visibility, etc as mentioned above) because then
> I also have to check all the actions and that will add extra branching.
> And out of the two callers of heap_page_prune_and_freeze(), one will do
> all of the actions and one will do none of them except "main" pruning.

This morning I worked on a version of this patchset which moved the
counting of live and recently dead tuples and the calculation of the
vm conflict horizon back to lazy_scan_prune() but kept the freezing
and dead offset collection in heap_prune_chain(). I encountered the
same problem with ensuring each tuple was considered for freezing
exactly once. It also made me realize that my patch set (v7) still has
the same problem in which all_visible_except_removable will be
incorrectly set to false and recently dead tuples incorrectly
incremented when encountering HEAPTUPLE_RECENTLY_DEAD tuples whose
line pointers get set LP_DEAD during pruning. And I think I am
incorrectly calling heap_prepare_freeze_tuple() on them too.

I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mitar 2024-03-27 15:22:03 Adding application_name to the error and notice message fields
Previous Message Robert Haas 2024-03-27 15:10:31 Re: Possibility to disable `ALTER SYSTEM`