Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-21 10:48:35
Message-ID: CANWCAZZTE-14ofsucofTuhFsfuDGBNf=NZb22TMYT8bxA41oQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 21, 2024 at 4:03 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've looked into this idea further. Overall, it looks clean and I
> don't see any problem so far in terms of integration with lazy vacuum.
> I've attached three patches for discussion and tests.

Seems okay in the big picture, it's the details we need to be careful of.

v77-0001

- dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
+ vacrel->dead_items = TidStoreCreate(vac_work_mem, NULL, 0);
+
+ dead_items_info = (VacDeadItemsInfo *) palloc(sizeof(VacDeadItemsInfo));
+ dead_items_info->max_bytes = vac_work_mem * 1024L;

This is confusing enough that it looks like a bug:

[inside TidStoreCreate()]
/* choose the maxBlockSize to be no larger than 1/16 of max_bytes */
while (16 * maxBlockSize > max_bytes * 1024L)
maxBlockSize >>= 1;

This was copied from CreateWorkExprContext, which operates directly on
work_mem -- if the parameter is actually bytes, we can't "* 1024"
here. If we're passing something measured in kilobytes, the parameter
is badly named. Let's use convert once and use bytes everywhere.

Note: This was not another pass over the whole vacuum patch, just
looking an the issue at hand.
Also for later: Dilip Kumar reviewed an earlier version.

v77-0002:

+#define dsa_create(tranch_id) \
+ dsa_create_ext(tranch_id, DSA_INITIAL_SEGMENT_SIZE, DSA_MAX_SEGMENT_SIZE)

Since these macros are now referring to defaults, maybe their name
should reflect that. Something like DSA_DEFAULT_INIT_SEGMENT_SIZE
(*_MAX_*)

+/* The minimum size of a DSM segment. */
+#define DSA_MIN_SEGMENT_SIZE ((size_t) 1024)

That's a *lot* smaller than it is now. Maybe 256kB? We just want 1MB
m_w_m to work correctly.

v77-0003:

+/* Public APIs to create local or shared TidStore */
+
+TidStore *
+TidStoreCreateLocal(size_t max_bytes)
+{
+ return tidstore_create_internal(max_bytes, false, 0);
+}
+
+TidStore *
+TidStoreCreateShared(size_t max_bytes, int tranche_id)
+{
+ return tidstore_create_internal(max_bytes, true, tranche_id);
+}

I don't think these operations have enough in common to justify
sharing even an internal implementation. Choosing aset block size is
done for both memory types, but it's pointless to do it for shared
memory, because the local context is then only used for small
metadata.

+ /*
+ * Choose the DSA initial and max segment sizes to be no longer than
+ * 1/16 and 1/8 of max_bytes, respectively.
+ */

I'm guessing the 1/8 here because the number of segments is limited? I
know these numbers are somewhat arbitrary, but readers will wonder why
one has 1/8 and the other has 1/16.

+ if (dsa_init_size < DSA_MIN_SEGMENT_SIZE)
+ dsa_init_size = DSA_MIN_SEGMENT_SIZE;
+ if (dsa_max_size < DSA_MAX_SEGMENT_SIZE)
+ dsa_max_size = DSA_MAX_SEGMENT_SIZE;

The second clamp seems against the whole point of this patch -- it
seems they should all be clamped bigger than the DSA_MIN_SEGMENT_SIZE?
Did you try it with 1MB m_w_m?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-03-21 10:51:55 Re: Eager aggregation, take 3
Previous Message Bharath Rupireddy 2024-03-21 10:43:31 Re: Introduce XID age and inactive timeout based replication slot invalidation