Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-04 11:48:21
Message-ID: CANWCAZa9Gj82=6y=4RY-7aK76EaX86zPEScyT+A4Pq=Cv42iFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 4, 2024 at 1:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Mar 3, 2024 at 2:43 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:

> > Right, I should have said "reset". Resetting a context will delete
> > it's children as well, and seems like it should work to reset the tree
> > context, and we don't have to know whether that context actually
> > contains leaves at all. That should allow copying "tree context" to
> > "leaf context" in the case where we have no special context for
> > leaves.
>
> Resetting the tree->context seems to work. But I think we should note
> for callers that the dsa_area passed to RT_CREATE should be created in
> a different context than the context passed to RT_CREATE because
> otherwise RT_FREE() will also free the dsa_area. For example, the
> following code in test_radixtree.c will no longer work:
>
> dsa = dsa_create(tranche_id);
> radixtree = rt_create(CurrentMemoryContext, dsa, tranche_id);
> :
> rt_free(radixtree);
> dsa_detach(dsa); // dsa is already freed.
>
> So I think that a practical usage of the radix tree will be that the
> caller creates a memory context for a radix tree and passes it to
> RT_CREATE().

That sounds workable to me.

> I've attached an update patch set:
>
> - 0008 updates RT_FREE_RECURSE().

Thanks!

> - 0009 patch is an updated version of cleanup radix tree memory handling.

Looks pretty good, as does the rest. I'm going through again,
squashing and making tiny adjustments to the template. The only thing
not done is changing the test with many values to resemble the perf
test more.

I wrote:
> > Secondly, I thought about my recent work to skip checking if we first
> > need to create a root node, and that has a harmless (for vacuum at
> > least) but slightly untidy behavior: When RT_SET is first called, and
> > the key is bigger than 255, new nodes will go on top of the root node.
> > These have chunk '0'. If all subsequent keys are big enough, the
> > orginal root node will stay empty. If all keys are deleted, there will
> > be a chain of empty nodes remaining. Again, I believe this is
> > harmless, but to make tidy, it should easy to teach RT_EXTEND_UP to
> > call out to RT_EXTEND_DOWN if it finds the tree is empty. I can work
> > on this, but likely not today.
>
> This turns out to be a lot trickier than it looked, so it seems best
> to allow a trivial amount of waste, as long as it's documented
> somewhere. It also wouldn't be terrible to re-add those branches,
> since they're highly predictable.

I put a little more work into this, and got it working, just needs a
small amount of finicky coding. I'll share tomorrow.

I have a question about RT_FREE_RECURSE:

+ check_stack_depth();
+ CHECK_FOR_INTERRUPTS();

I'm not sure why these are here: The first seems overly paranoid,
although harmless, but the second is probably a bad idea. Why should
the user be able to to interrupt the freeing of memory?

Also, I'm not quite happy that RT_ITER has a copy of a pointer to the
tree, leading to coding like "iter->tree->ctl->root". I *think* it
would be easier to read if the tree was a parameter to these iteration
functions. That would require an API change, so the tests/tidstore
would have some churn. I can do that, but before trying I wanted to
see what you think -- is there some reason to keep the current way?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-03-04 11:51:46 Re: CF entries for 17 to be reviewed
Previous Message Ranier Vilela 2024-03-04 11:47:11 Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)