Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-22 05:19:44
Message-ID: CAD21AoCaDT+-ZaVjbtvumms0tyyHPNLELK2UX-MLG9XCgioaNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 21, 2024 at 7:48 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Thu, Mar 21, 2024 at 4:03 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've looked into this idea further. Overall, it looks clean and I
> > don't see any problem so far in terms of integration with lazy vacuum.
> > I've attached three patches for discussion and tests.
>
> Seems okay in the big picture, it's the details we need to be careful of.
>
> v77-0001
>
> - dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
> - dead_items->max_items = max_items;
> - dead_items->num_items = 0;
> + vacrel->dead_items = TidStoreCreate(vac_work_mem, NULL, 0);
> +
> + dead_items_info = (VacDeadItemsInfo *) palloc(sizeof(VacDeadItemsInfo));
> + dead_items_info->max_bytes = vac_work_mem * 1024L;
>
> This is confusing enough that it looks like a bug:
>
> [inside TidStoreCreate()]
> /* choose the maxBlockSize to be no larger than 1/16 of max_bytes */
> while (16 * maxBlockSize > max_bytes * 1024L)
> maxBlockSize >>= 1;
>
> This was copied from CreateWorkExprContext, which operates directly on
> work_mem -- if the parameter is actually bytes, we can't "* 1024"
> here. If we're passing something measured in kilobytes, the parameter
> is badly named. Let's use convert once and use bytes everywhere.

True. The attached 0001 patch fixes it.

>
> v77-0002:
>
> +#define dsa_create(tranch_id) \
> + dsa_create_ext(tranch_id, DSA_INITIAL_SEGMENT_SIZE, DSA_MAX_SEGMENT_SIZE)
>
> Since these macros are now referring to defaults, maybe their name
> should reflect that. Something like DSA_DEFAULT_INIT_SEGMENT_SIZE
> (*_MAX_*)

It makes sense to rename DSA_INITIAL_SEGMENT_SIZE , but I think that
the DSA_MAX_SEGMENT_SIZE is the theoretical maximum size, the current
name also makes sense to me.

>
> +/* The minimum size of a DSM segment. */
> +#define DSA_MIN_SEGMENT_SIZE ((size_t) 1024)
>
> That's a *lot* smaller than it is now. Maybe 256kB? We just want 1MB
> m_w_m to work correctly.

Fixed.

>
> v77-0003:
>
> +/* Public APIs to create local or shared TidStore */
> +
> +TidStore *
> +TidStoreCreateLocal(size_t max_bytes)
> +{
> + return tidstore_create_internal(max_bytes, false, 0);
> +}
> +
> +TidStore *
> +TidStoreCreateShared(size_t max_bytes, int tranche_id)
> +{
> + return tidstore_create_internal(max_bytes, true, tranche_id);
> +}
>
> I don't think these operations have enough in common to justify
> sharing even an internal implementation. Choosing aset block size is
> done for both memory types, but it's pointless to do it for shared
> memory, because the local context is then only used for small
> metadata.
>
> + /*
> + * Choose the DSA initial and max segment sizes to be no longer than
> + * 1/16 and 1/8 of max_bytes, respectively.
> + */
>
> I'm guessing the 1/8 here because the number of segments is limited? I
> know these numbers are somewhat arbitrary, but readers will wonder why
> one has 1/8 and the other has 1/16.
>
> + if (dsa_init_size < DSA_MIN_SEGMENT_SIZE)
> + dsa_init_size = DSA_MIN_SEGMENT_SIZE;
> + if (dsa_max_size < DSA_MAX_SEGMENT_SIZE)
> + dsa_max_size = DSA_MAX_SEGMENT_SIZE;
>
> The second clamp seems against the whole point of this patch -- it
> seems they should all be clamped bigger than the DSA_MIN_SEGMENT_SIZE?
> Did you try it with 1MB m_w_m?

I've incorporated the above comments and test results look good to me.

I've attached the several patches:

- 0002 is a minor fix for tidstore I found.
- 0005 changes the create APIs of tidstore.
- 0006 update the vacuum improvement patch to use the new
TidStoreCreateLocal/Shared() APIs.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v78-0005-Rethink-create-and-attach-APIs-of-shared-TidStor.patch application/octet-stream 10.1 KB
v78-0004-Allow-specifying-initial-and-maximum-segment-siz.patch application/octet-stream 9.7 KB
v78-0003-Use-TidStore-for-dead-tuple-TIDs-storage-during-.patch application/octet-stream 43.9 KB
v78-0006-Adjust-the-vacuum-improvement-patch-to-new-TidSt.patch application/octet-stream 6.3 KB
v78-0002-Fix-an-inconsistent-function-prototype-with-the-.patch application/octet-stream 1023 bytes
v78-0001-Fix-a-calculation-in-TidStoreCreate.patch application/octet-stream 1.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-03-22 05:33:37 Re: SQL:2011 application time
Previous Message Amit Kapila 2024-03-22 05:19:17 Re: Introduce XID age and inactive timeout based replication slot invalidation