Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-12-14 00:22:11
Message-ID: CAD21AoDjTbp2SHn4hRzAxWNeYArn4Yd4UdH9XRoNzdrYWNgExw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 12, 2023 at 11:53 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Mon, Dec 11, 2023 at 1:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I've attached the updated patch set. From the previous patch set, I've
> > merged patches 0007 to 0010. The other changes such as adding RT_GET()
> > still are unmerged for now, for discussion. Probably we can make them
> > as follow-up patches as we discussed. 0011 to 0015 patches are new
> > changes for v44 patch set, which removes RT_SEARCH() and RT_SET() and
> > support variable-length values.
>
> This looks like the right direction, and I'm pleased it's not much
> additional code on top of my last patch.
>
> v44-0014:
>
> +#ifdef RT_VARLEN_VALUE
> + /* XXX: need to choose block sizes? */
> + tree->leaf_ctx = AllocSetContextCreate(ctx,
> + "radix tree leaves",
> + ALLOCSET_DEFAULT_SIZES);
> +#else
> + tree->leaf_ctx = SlabContextCreate(ctx,
> + "radix tree leaves",
> + RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
> + sizeof(RT_VALUE_TYPE));
> +#endif /* RT_VARLEN_VALUE */
>
> Choosing block size: Similar to what we've discussed previously around
> DSA segments, we might model this on CreateWorkExprContext() in
> src/backend/executor/execUtils.c. Maybe tid store can pass maint_w_m /
> autovac_w_m (later work_mem for bitmap scan). RT_CREATE could set the
> max block size to 1/16 of that, or less.
>
> Also, it occurred to me that compile-time embeddable values don't need
> a leaf context. I'm not sure how many places assume that there is
> always a leaf context. If not many, it may be worth not creating one
> here, just to be tidy.
>
> + size_t copysize;
>
> - memcpy(leaf.local, value_p, sizeof(RT_VALUE_TYPE));
> + copysize = sizeof(RT_VALUE_TYPE);
> +#endif
> +
> + memcpy(leaf.local, value_p, copysize);
>
> I'm not sure this indirection adds clarity. I guess the intent was to
> keep from saying "memcpy" twice, but now the code has to say "copysize
> = foo" twice.
>
> For varlen case, we need to watch out for slowness because of memcpy.
> Let's put that off for later testing, though. We may someday want to
> avoid a memcpy call for the varlen case, so let's keep it flexible
> here.
>
> v44-0015:
>
> +#define SizeOfBlocktableEntry (offsetof(
>
> Unused.
>
> + char buf[MaxBlocktableEntrySize] = {0};
>
> Zeroing this buffer is probably going to be expensive. Also see this
> pre-existing comment:
> /* WIP: slow, since it writes to memory for every bit */
> page->words[wordnum] |= ((bitmapword) 1 << bitnum);
>
> For this function (which will be vacuum-only, so we can assume
> ordering), in the loop we can:
> * declare the local bitmapword variable to be zero
> * set the bits on it
> * write it out to the right location when done.
>
> Let's fix both of these at once.
>
> + if (TidStoreIsShared(ts))
> + shared_rt_set(ts->tree.shared, blkno, (void *) page, page_len);
> + else
> + local_rt_set(ts->tree.local, blkno, (void *) page, page_len);
>
> Is there a reason for "void *"? The declared parameter is
> "RT_VALUE_TYPE *value_p" in 0014.
> Also, since this function is for vacuum (and other uses will need a
> new function), let's assert the returned bool is false.
>
> Does iteration still work? If so, it's not too early to re-wire this
> up with vacuum and see how it behaves.
>
> Lastly, my compiler has a warning that CI doesn't have:
>
> In file included from ../src/test/modules/test_radixtree/test_radixtree.c:121:
> ../src/include/lib/radixtree.h: In function ‘rt_find.isra’:
> ../src/include/lib/radixtree.h:2142:24: warning: ‘slot’ may be used
> uninitialized [-Wmaybe-uninitialized]
> 2142 | return (RT_VALUE_TYPE*) slot;
> | ^~~~~~~~~~~~~~~~~~~~~
> ../src/include/lib/radixtree.h:2112:23: note: ‘slot’ was declared here
> 2112 | RT_PTR_ALLOC *slot;
> | ^~~~

Thank you for the comments! I agreed with all of them and incorporated
them into the attached latest patch set, v45.

In v45, 0001 - 0006 are from earlier versions but I've merged previous
updates. So the radix tree now has RT_SET() and RT_FIND() but not
RT_GET() and RT_SEARCH(). 0007 and 0008 are the updates from previous
versions that incorporated the above comments. 0009 patch integrates
tidstore with lazy vacuum. Note that DSA segment problem is not
resolved yet in this patch. 0010 and 0011 makes DSA initial/max
segment size configurable and make parallel vacuum specify both in
proportion to maintenance_work_mem. 0012 is a development-purpose
patch to make it easy to investigate bugs in tidstore. I'd like to
keep it in the patch set at least during the development.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v45-ART.tar.gz application/x-gzip 67.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2023-12-14 00:35:01 Re: Teach predtest about IS [NOT] <boolean> proofs
Previous Message Imseih (AWS), Sami 2023-12-13 22:53:47 Re: [BUG] autovacuum may skip tables when session_authorization/role is set on database