Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-24 06:42:43
Message-ID: CANWCAZYYzoKp_4+1m5mn-TRD62BTwom8iLXLOWMsHkkwFi=rzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 23, 2024 at 10:58 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> The new patches probably need to be polished but the VacDeadItemInfo
> idea looks good to me.

That idea looks good to me, too. Since you already likely know what
you'd like to polish, I don't have much to say except for a few
questions below. I also did a quick sweep through every patch, so some
of these comments are unrelated to recent changes:

v55-0003:

+size_t
+dsa_get_total_size(dsa_area *area)
+{
+ size_t size;
+
+ LWLockAcquire(DSA_AREA_LOCK(area), LW_SHARED);
+ size = area->control->total_segment_size;
+ LWLockRelease(DSA_AREA_LOCK(area));

I looked and found dsa.c doesn't already use shared locks in HEAD,
even dsa_dump. Not sure why that is...

+/*
+ * Calculate the slab blocksize so that we can allocate at least 32 chunks
+ * from the block.
+ */
+#define RT_SLAB_BLOCK_SIZE(size) \
+ Max((SLAB_DEFAULT_BLOCK_SIZE / (size)) * (size), (size) * 32)

The first parameter seems to be trying to make the block size exact,
but that's not right, because of the chunk header, and maybe
alignment. If the default block size is big enough to waste only a
tiny amount of space, let's just use that as-is. Also, I think all
block sizes in the code base have been a power of two, but I'm not
sure how much that matters.

+#ifdef RT_SHMEM
+ fprintf(stderr, " [%d] chunk %x slot " DSA_POINTER_FORMAT "\n",
+ i, n4->chunks[i], n4->children[i]);
+#else
+ fprintf(stderr, " [%d] chunk %x slot %p\n",
+ i, n4->chunks[i], n4->children[i]);
+#endif

Maybe we could invent a child pointer format, so we only #ifdef in one place.

--- /dev/null
+++ b/src/test/modules/test_radixtree/meson.build
@@ -0,0 +1,35 @@
+# FIXME: prevent install during main install, but not during test :/

Can you look into this?

test_radixtree.c:

+/*
+ * XXX: should we expose and use RT_SIZE_CLASS and RT_SIZE_CLASS_INFO?
+ */
+static int rt_node_class_fanouts[] = {
+ 4, /* RT_CLASS_3 */
+ 15, /* RT_CLASS_32_MIN */
+ 32, /* RT_CLASS_32_MAX */
+ 125, /* RT_CLASS_125 */
+ 256 /* RT_CLASS_256 */
+};

These numbers have been wrong a long time, too, but only matters for
figuring out where it went wrong when something is broken. And for the
XXX, instead of trying to use the largest number that should fit (it's
obviously not testing that the expected node can actually hold that
number anyway), it seems we can just use a "big enough" number to
cause growing into the desired size class.

As far as cleaning up the tests, I always wondered why these didn't
use EXPECT_TRUE, EXPECT_FALSE, etc. as in Andres's prototype where
where convenient, and leave comments above the tests. That seemed like
a good idea to me -- was there a reason to have hand-written branches
and elog messages everywhere?

--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -101,6 +101,12 @@ do
test "$f" = src/include/nodes/nodetags.h && continue
test "$f" = src/backend/nodes/nodetags.h && continue

+ # radixtree_*_impl.h cannot be included standalone: they are just
code fragments.
+ test "$f" = src/include/lib/radixtree_delete_impl.h && continue
+ test "$f" = src/include/lib/radixtree_insert_impl.h && continue
+ test "$f" = src/include/lib/radixtree_iter_impl.h && continue
+ test "$f" = src/include/lib/radixtree_search_impl.h && continue

Ha! I'd forgotten about these -- they're long outdated.

v55-0005:

- * The radix tree is locked in shared mode during the iteration, so
- * RT_END_ITERATE needs to be called when finished to release the lock.
+ * The caller needs to acquire a lock in shared mode during the iteration
+ * if necessary.

"need if necessary" is maybe better phrased as "is the caller's responsibility"

+ /*
+ * We can rely on DSA_AREA_LOCK to get the total amount of DSA memory.
+ */
total = dsa_get_total_size(tree->dsa);

Maybe better to have a header comment for RT_MEMORY_USAGE that the
caller doesn't need to take a lock.

v55-0006:

"WIP: Not built, since some benchmarks have broken" -- I'll work on
this when I re-run some benchmarks.

v55-0007:

+ * Internally, a tid is encoded as a pair of 64-bit key and 64-bit value,
+ * and stored in the radix tree.

This hasn't been true for a few months now, and I thought we fixed
this in some earlier version?

+ * TODO: The caller must be certain that no other backend will attempt to
+ * access the TidStore before calling this function. Other backend must
+ * explicitly call TidStoreDetach() to free up backend-local memory associated
+ * with the TidStore. The backend that calls TidStoreDestroy() must not call
+ * TidStoreDetach().

Do we need to do anything now?

v55-0008:

-TidStoreAttach(dsa_area *area, TidStoreHandle handle)
+TidStoreAttach(dsa_area *area, dsa_pointer rt_dp)

"handle" seemed like a fine name. Is that not the case anymore? The
new one is kind of cryptic. The commit message just says "remove
control object" -- does that imply that we need to think of this
parameter differently, or is it unrelated? (Same with
dead_items_handle in 0011)

v55-0011:

+ /*
+ * Recreate the tidstore with the same max_bytes limitation. We cannot
+ * use neither maintenance_work_mem nor autovacuum_work_mem as they could
+ * already be changed.
+ */

I don't understand this part.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-01-24 07:02:30 Re: partitioning and identity column
Previous Message Ashutosh Bapat 2024-01-24 06:34:43 Re: partitioning and identity column