Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-18 01:30:54
Message-ID: CAD21AoDV4XibhBCwebbt8FevRf9Lhqt6UqgVtYJDxoJwjv50JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2024 at 11:37 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Wed, Jan 17, 2024 at 8:39 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 17, 2024 at 9:20 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 1:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > Just changing "items" to be the local tidstore struct could make the
> > > > code tricky a bit, since max_bytes and num_items are on the shared
> > > > memory while "items" is a local pointer to the shared tidstore.
> > >
> > > Thanks for trying it this way! I like the overall simplification but
> > > this aspect is not great.
> > > Hmm, I wonder if that's a side-effect of the "create" functions doing
> > > their own allocations and returning a pointer. Would it be less tricky
> > > if the structs were declared where we need them and passed to "init"
> > > functions?
> >
> > Seems worth trying. The current RT_CREATE() API is also convenient as
> > other data structure such as simplehash.h and dshash.c supports a
> > similar
>
> I don't happen to know if these paths had to solve similar trickiness
> with some values being local, and some shared.
>
> > > That may be a good idea for other reasons. It's awkward that the
> > > create function is declared like this:
> > >
> > > #ifdef RT_SHMEM
> > > RT_SCOPE RT_RADIX_TREE *RT_CREATE(MemoryContext ctx, Size max_bytes,
> > > dsa_area *dsa,
> > > int tranche_id);
> > > #else
> > > RT_SCOPE RT_RADIX_TREE *RT_CREATE(MemoryContext ctx, Size max_bytes);
> > > #endif
> > >
> > > An init function wouldn't need these parameters: it could look at the
> > > passed struct to know what to do.
> >
> > But the init function would initialize leaf_ctx etc,no? Initializing
> > leaf_ctx needs max_bytes that is not stored in RT_RADIX_TREE.
>
> I was more referring to the parameters that were different above
> depending on shared memory. My first thought was that the tricky part
> is because of the allocation in local memory, but it's certainly
> possible I've misunderstood the problem.
>
> > The same
> > is true for dsa. I imagined that an init function would allocate a DSA
> > memory for the control object.
>
> Yes:
>
> ...
> // embedded in VacDeadItems
> TidStore items;
> };
>
> // NULL DSA in local case, etc
> dead_items->items.area = dead_items_dsa;
> dead_items->items.tranche_id = FOO_ID;
>
> TidStoreInit(&dead_items->items, vac_work_mem);
>
> That's how I imagined it would work (leaving out some details). I
> haven't tried it, so not sure how much it helps. Maybe it has other
> problems, but I'm hoping it's just a matter of programming.

It seems we cannot make this work nicely. IIUC VacDeadItems is
allocated in DSM and TidStore is embedded there. However,
dead_items->items.area is a local pointer to dsa_area. So we cannot
include dsa_area in neither TidStore nor RT_RADIX_TREE. Instead we
would need to pass dsa_area to each interface by callers.

>
> If we can't make this work nicely, I'd be okay with keeping the tid
> store control object. My biggest concern is unnecessary
> double-locking.

If we don't do any locking stuff in radix tree APIs and it's the
user's responsibility at all, probably we don't need a lock for
tidstore? That is, we expose lock functions as you mentioned and the
user (like tidstore) acquires/releases the lock before/after accessing
the radix tree and num_items. Currently (as of v52 patch) RT_FIND is
doing so, but we would need to change RT_SET() and iteration functions
as well.

During trying this idea, I realized that there is a visibility problem
in the radix tree template especially if we want to embed the radix
tree in a struct. Considering a use case where we want to use a radix
tree in an exposed struct, we would declare only interfaces in a .h
file and define actual implementation in a .c file (FYI
TupleHashTableData does a similar thing with simplehash.h). The .c
file and .h file would be like:

in .h file:
#define RT_PREFIX local_rt
#define RT_SCOPE extern
#define RT_DECLARE
#define RT_VALUE_TYPE BlocktableEntry
#define RT_VARLEN_VALUE
#include "lib/radixtree.h"

typedef struct TidStore
{
:
local_rt_radix_tree tree; /* embedded */
:
} TidStore;

in .c file:

#define RT_PREFIX local_rt
#define RT_SCOPE extern
#define RT_DEFINE
#define RT_VALUE_TYPE BlocktableEntry
#define RT_VARLEN_VALUE
#include "lib/radixtree.h"

But it doesn't work as the compiler doesn't know the actual definition
of local_rt_radix_tree. If the 'tree' is *local_rt_radix_tree, it
works. The reason is that with RT_DECLARE but without RT_DEFINE, the
radix tree template generates only forward declarations:

#ifdef RT_DECLARE

typedef struct RT_RADIX_TREE RT_RADIX_TREE;
typedef struct RT_ITER RT_ITER;

In order to make it work, we need to move the definitions required to
expose RT_RADIX_TREE struct to RT_DECLARE part, which actually
requires to move RT_NODE, RT_HANDLE, RT_NODE_PTR, RT_SIZE_CLASS_COUNT,
and RT_RADIX_TREE_CONTROL etc. However RT_SIZE_CLASS_COUNT, used in
RT_RADIX_TREE, could be bothersome. Since it refers to
RT_SIZE_CLASS_INFO that further refers to many #defines and structs,
we might end up moving many structs such as RT_NODE_4 etc to
RT_DECLARE part as well. Or we can use a fixed number is stead of
"lengthof(RT_SIZE_CLASS_INFO)". Apart from that, macros requried by
both RT_DECLARE and RT_DEFINE such as RT_PAN and RT_MAX_LEVEL also
needs to be moved to a common place where they are defined in both
cases.

Given these facts, I think that the current abstraction works nicely
and it would make sense not to support embedding the radix tree.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-01-18 01:41:41 Re: Strange Bitmapset manipulation in DiscreteKnapsack()
Previous Message Michael Paquier 2024-01-18 01:29:11 Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx