Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-02-29 11:43:39
Message-ID: CANWCAZYTOWDome6seUwN7G-vwFhGs_E21f=Mrjmd6b30c7b0_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 20, 2024 at 1:59 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> - v63-0008 patch fixes a bug in tidstore.

- page->nwords = wordnum + 1;
- Assert(page->nwords = WORDS_PER_PAGE(offsets[num_offsets - 1]));
+ page->nwords = wordnum;
+ Assert(page->nwords == WORDS_PER_PAGE(offsets[num_offsets - 1]));

Yikes, I'm guessing this failed in a non-assert builds? I wonder why
my compiler didn't yell at me... Have you tried a tidstore-debug build
without asserts?

> - v63-0009 patch is a draft idea of cleanup memory context handling.

Thanks, looks pretty good!

+ ts->rt_context = AllocSetContextCreate(CurrentMemoryContext,
+ "tidstore storage",

"tidstore storage" sounds a bit strange -- maybe look at some other
context names for ideas.

- leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_ctx, allocsize);
+ leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_ctx != NULL
+ ? tree->leaf_ctx
+ : tree->context, allocsize);

Instead of branching here, can we copy "context" to "leaf_ctx" when
necessary (those names should look more like eachother, btw)? I think
that means anything not covered by this case:

+#ifndef RT_VARLEN_VALUE_SIZE
+ if (sizeof(RT_VALUE_TYPE) > sizeof(RT_PTR_ALLOC))
+ tree->leaf_ctx = SlabContextCreate(ctx,
+ RT_STR(RT_PREFIX) "radix_tree leaf contex",
+ RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
+ sizeof(RT_VALUE_TYPE));
+#endif /* !RT_VARLEN_VALUE_SIZE */

...also, we should document why we're using slab here. On that, I
don't recall why we are? We've never had a fixed-length type test case
on 64-bit, so it wasn't because it won through benchmarking. It seems
a hold-over from the days of "multi-value leaves". Is it to avoid the
possibility of space wastage with non-power-of-two size types?

For this stanza that remains unchanged:

for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++)
{
MemoryContextDelete(tree->node_slabs[i]);
}

if (tree->leaf_ctx)
{
MemoryContextDelete(tree->leaf_ctx);
}

...is there a reason we can't just delete tree->ctx, and let that
recursively delete child contexts?

Secondly, I thought about my recent work to skip checking if we first
need to create a root node, and that has a harmless (for vacuum at
least) but slightly untidy behavior: When RT_SET is first called, and
the key is bigger than 255, new nodes will go on top of the root node.
These have chunk '0'. If all subsequent keys are big enough, the
orginal root node will stay empty. If all keys are deleted, there will
be a chain of empty nodes remaining. Again, I believe this is
harmless, but to make tidy, it should easy to teach RT_EXTEND_UP to
call out to RT_EXTEND_DOWN if it finds the tree is empty. I can work
on this, but likely not today.

Thirdly, cosmetic: With the introduction of single-value leaves, it
seems we should do s/RT_NODE_PTR/RT_CHILD_PTR/ -- what do you think?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-02-29 12:20:39 Re: brininsert optimization opportunity
Previous Message Peter Eisentraut 2024-02-29 11:41:38 Re: Improve readability by using designated initializers when possible