Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-05 01:27:21
Message-ID: CAD21AoD-anVc6aozi+pi2Zn0CJA3bPOS3i7oABpZjALPbmwtbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 4, 2024 at 8:48 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Mon, Mar 4, 2024 at 1:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Mar 3, 2024 at 2:43 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> > > Right, I should have said "reset". Resetting a context will delete
> > > it's children as well, and seems like it should work to reset the tree
> > > context, and we don't have to know whether that context actually
> > > contains leaves at all. That should allow copying "tree context" to
> > > "leaf context" in the case where we have no special context for
> > > leaves.
> >
> > Resetting the tree->context seems to work. But I think we should note
> > for callers that the dsa_area passed to RT_CREATE should be created in
> > a different context than the context passed to RT_CREATE because
> > otherwise RT_FREE() will also free the dsa_area. For example, the
> > following code in test_radixtree.c will no longer work:
> >
> > dsa = dsa_create(tranche_id);
> > radixtree = rt_create(CurrentMemoryContext, dsa, tranche_id);
> > :
> > rt_free(radixtree);
> > dsa_detach(dsa); // dsa is already freed.
> >
> > So I think that a practical usage of the radix tree will be that the
> > caller creates a memory context for a radix tree and passes it to
> > RT_CREATE().
>
> That sounds workable to me.
>
> > I've attached an update patch set:
> >
> > - 0008 updates RT_FREE_RECURSE().
>
> Thanks!
>
> > - 0009 patch is an updated version of cleanup radix tree memory handling.
>
> Looks pretty good, as does the rest. I'm going through again,
> squashing and making tiny adjustments to the template. The only thing
> not done is changing the test with many values to resemble the perf
> test more.
>
> I wrote:
> > > Secondly, I thought about my recent work to skip checking if we first
> > > need to create a root node, and that has a harmless (for vacuum at
> > > least) but slightly untidy behavior: When RT_SET is first called, and
> > > the key is bigger than 255, new nodes will go on top of the root node.
> > > These have chunk '0'. If all subsequent keys are big enough, the
> > > orginal root node will stay empty. If all keys are deleted, there will
> > > be a chain of empty nodes remaining. Again, I believe this is
> > > harmless, but to make tidy, it should easy to teach RT_EXTEND_UP to
> > > call out to RT_EXTEND_DOWN if it finds the tree is empty. I can work
> > > on this, but likely not today.
> >
> > This turns out to be a lot trickier than it looked, so it seems best
> > to allow a trivial amount of waste, as long as it's documented
> > somewhere. It also wouldn't be terrible to re-add those branches,
> > since they're highly predictable.
>
> I put a little more work into this, and got it working, just needs a
> small amount of finicky coding. I'll share tomorrow.
>
> I have a question about RT_FREE_RECURSE:
>
> + check_stack_depth();
> + CHECK_FOR_INTERRUPTS();
>
> I'm not sure why these are here: The first seems overly paranoid,
> although harmless, but the second is probably a bad idea. Why should
> the user be able to to interrupt the freeing of memory?

Good catch. We should not check the interruption there.

> Also, I'm not quite happy that RT_ITER has a copy of a pointer to the
> tree, leading to coding like "iter->tree->ctl->root". I *think* it
> would be easier to read if the tree was a parameter to these iteration
> functions. That would require an API change, so the tests/tidstore
> would have some churn. I can do that, but before trying I wanted to
> see what you think -- is there some reason to keep the current way?

I considered both usages, there are two reasons for the current style.
I'm concerned that if we pass both the tree and RT_ITER to iteration
functions, the caller could mistakenly pass a different tree than the
one that was specified to create the RT_ITER. And the second reason is
just to make it consistent with other data structures such as
dynahash.c and dshash.c, but I now realized that in simplehash.h we
pass both the hash table and the iterator.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-03-05 01:42:46 pgsql: Fix search_path to a safe value during maintenance operations.
Previous Message Amit Langote 2024-03-05 01:21:54 Re: remaining sql/json patches