Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-21 07:35:08
Message-ID: CANWCAZbYw0d=6dO7WsVhMWoWUN+qyomJFmxBD23Ye2ZxLbhfeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 21, 2024 at 1:11 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> Or we can have a new function for dsa.c to set the initial and max
> segment size (or either one) to the existing DSA area so that
> TidStoreCreate() can specify them at creation.

I didn't like this very much, because it's splitting an operation
across an API boundary. The caller already has all the information it
needs when it creates the DSA. Straw man proposal: it could do the
same for local memory, then they'd be more similar. But if we made
local contexts the responsibility of the caller, that would cause
duplication between creating and resetting.

> In shared TidStore
> cases, since all memory required by shared radix tree is allocated in
> the passed-in DSA area and the memory usage is the total segment size
> allocated in the DSA area

...plus apparently some overhead, I just found out today, but that's
beside the point.

On Thu, Mar 21, 2024 at 2:02 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Yet another idea is that TidStore creates its own DSA area in
> TidStoreCreate(). That is, In TidStoreCreate() we create a DSA area
> (using dsa_create()) and pass it to RT_CREATE(). Also, we need a new
> API to get the DSA area. The caller (e.g. parallel vacuum) gets the
> dsa_handle of the DSA and stores it in the shared memory (e.g. in
> PVShared). TidStoreAttach() will take two arguments: dsa_handle for
> the DSA area and dsa_pointer for the shared radix tree. This idea
> still requires controlling min/max segment sizes since dsa_create()
> uses the 1MB as the initial segment size. But the TidStoreCreate()
> would be more user friendly.

This seems like an overall simplification, aside from future size
configuration, so +1 to continue looking into this. If we go this
route, I'd like to avoid a boolean parameter and cleanly separate
TidStoreCreateLocal() and TidStoreCreateShared(). Every operation
after that can introspect, but it's a bit awkward to force these cases
into the same function. It always was a little bit, but this change
makes it more so.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-03-21 07:40:16 Re: New Table Access Methods for Multi and Single Inserts
Previous Message Corey Huinker 2024-03-21 07:27:47 Re: Statistics Import and Export