Re: Better shared data structure management and resizable shared data structures

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, chaturvedipalak1911(at)gmail(dot)com
Subject: Re: Better shared data structure management and resizable shared data structures
Date: 2026-04-02 06:58:12
Message-ID: CAExHW5sYgt=XekOoAE-Tu_Dv8obWZCjCqAPT9vtN2D4m=M=drQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 1, 2026 at 11:47 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> Yet another version attached (also available at:
> https://github.com/hlinnaka/postgres/tree/shmem-init-refactor-9) The
> main change is the shape of the ShmemRequest*() calls:
>
> On 27/03/2026 02:51, Heikki Linnakangas wrote:
> > Another idea is to use a macro to hide that from pgindent, which would
> > make the calls little less verbose anyway:
> >
> > #define ShmemRequestStruct(desc, ...) ShmemRequestStructWithOpts(desc,
> > &(ShmemRequestStructOpts) { __VA_ARGS__ })
> >
> > Then the call would be simply:
> >
> > ShmemRequestStruct(&pgssSharedStateDesc,
> > .name = "pg_stat_statements",
> > .size = sizeof(pgssSharedState),
> > .ptr = (void **) &pgss,
> > );
>
> I went with that approach. We're already doing something similar with
> XL_ROUTINE in xlogreader.h:
>
> #define XL_ROUTINE(...) &(XLogReaderRoutine){__VA_ARGS__}
>
> The calls look like this:
>
> xlogreader =
> XLogReaderAllocate(wal_segment_size, NULL,
> XL_ROUTINE(.page_read = &XLogPageRead,
> .segment_open = NULL,
> .segment_close = wal_segment_close),
> private);
>
> If we followed that example, ShmemRequestStruct() calls would look like
> this:
>
> ShmemRequestStruct(&pgssSharedStateDesc,
> SHMEM_STRUCT_OPTS(.name = "pg_stat_statements",
> .size = sizeof(pgssSharedState),
> .ptr = (void **) &pgss,
> );
>
> However, I don't like the deep indentation, it feels like the important
> stuff is buried to the right. And pgindent insists on that. So I went
> with the proposal I quoted above, turning ShmemRequestStruct(...) itself
> into a macro. If you need more complex options setup, you can set up the
> struct without the macro and call ShmemRequestStructWithOpts() directly,
> but so far all of the callers can use the macro.
>

I like this. I have tried it only for the resizable_shmem structure
which is not complex.

>
> Ashutosh, I think I've addressed most of your comments so far. I'm
> replying to just a few of them here that might need more discussion:
>

Thanks.

> >
> > +} shmem_startup_state;
> >
> > This isn't just startup state since the backend can toggle between
> > DONE and LATE_ATTACH_OR_INIT states after the startup. Probably
> > "shmem_state" would be a better name.
>
> Renamed to "shmem_request_state". And renamed "LATE_ATTACH_OR_INIT" to
> "AFTER_STARTUP_ATTACH_OR_INIT" to match the terminology I used elsewhere.
>
> I'm still not entirely happy with this state machine. It seems useful to
> have it for sanity checking, but it still feels a little unclear what
> state you're in at different points in the code, and as an aesthetic
> thing, the whole enum feels too prominent given that it's just for
> sanity checks.

I am ok even if it is used just for sanity checks - but with the
shared structure requests coming at any time during the life of a
server, it would be easy to get lost without those sanity checks. I
also see it being used in RegisterShmemCallbacks(), so it's not for
just sanity checks, right?

>
> > + ShmemStructDesc *desc = area->desc;
> > +
> > + AttachOrInit(desc, false, true);
> > + }
> > + list_free(requested_shmem_areas);
> > + requested_shmem_areas = NIL;
> >
> > If we pop all the nodes from the list, then the list should be NIL
> > right? Why do we need to free it?
> >
> > + else if (!init_allowed)
> > + {
> >
> > For the sake of documentation and sanity, I would add
> > Assert(!index_entry) here, possibly with a comment. Otherwise it feels
> > like we might be leaving a half-initialized entry in the hash table.
> >
> > What if attach_allowed is false and the entry is not found? Should we
> > throw an error in that case too? It would be foolish to call
> > AttachOrInit with both init_allowed and attach_allowed set to false,
> > but the API allows it and we should check for that.
> >
> > It feels like we should do something about the arguments. The function
> > is hard to read. init_allowed is actually the action the caller wants
> > to take if the entry is not found, and attach_allowed is the action
> > the caller wants to take if the entry is found.
> >
> > Also explain in the comment what does attach mean here especially in
> > case of fixed sized structures.
>
> I renamed it to AttachOrInitShmemIndexEntry, and the args to 'may_init'
> and 'may_attach'. But more importantly I added comments to explain the
> different usages. Hope that helps..

The explanation in the prologue looks good. But the function is still
confusing. Instead of if ... else fi ... chain, I feel organizing this
as below would make it more readable. (this was part of one of my
earlier edit patches).
if (found)
...
else
{
if (!may_init)
error
if (!index_entry)
error

... rest of the code to initialize and attach
}

But other than that I don't have any other brilliant ideas.

>
> On 01/04/2026 14:59, Ashutosh Bapat wrote:
> > 0008
> > ------
> > - LWLockRelease(AddinShmemInitLock);
> > + /* The hash table must be initialized already */
> > + Assert(pgss_hash != NULL);
> >
> > Does it make sense to also Assert(pgss)? A broader question is do we
> > want to make it a pattern that every user of ShmemRequest*() also
> > Assert()s that the pointer is non-NULL in the init callback? It is a
> > test that the ShmemRequest*(), which is far from, init_fn is working
> > correctly.
>
> The function does a lot of accesses of 'pgss' so if that's NULL you'll
> get a crash pretty quickly. I'm not sure if the Assert(pgss_hash !=
> NULL) is really needed either, but I'm inclined to keep it, as pgss_hash
> might not otherwise be accessed in the function, and there are runtime
> checks for it in the other functions, so if it's not initialized for
> some reason, things might still appear to work to some extent. I don't
> think I want to have that as a broader pattern though.

In Assert build, an Assert() at least appears in the server log file,
that gives a good direction to start investigation. Without Assert, it
gives segmentation faults without any idea where it came from. That's
a mild benefit of assert.

>
> > + /*
> > + * Extra space to reserve in the shared memory segment, but it's not part
> > + * of the struct itself. This is used for shared memory hash tables that
> > + * can grow beyond the initial size when more buckets are allocated.
> > + */
> > + size_t extra_size;
> >
> > When we introduce resizable structures (where even the hash table
> > directly itself could be resizable), we will introduce a new field
> > max_size which is easy to get confused with extra_size. Maybe we can
> > rename extra_size to something like "auxilliary_size" to mean size of
> > the auxiliary parts of the structure which are not part of the main
> > struct itself.
> >
> > + /*
> > + * max_size is the estimated maximum number of hashtable entries. This is
> > + * not a hard limit, but the access efficiency will degrade if it is
> > + * exceeded substantially (since it's used to compute directory size and
> > + * the hash table buckets will get overfull).
> > + */
> > + size_t max_size;
> > +
> > + /*
> > + * init_size is the number of hashtable entries to preallocate. For a
> > + * table whose maximum size is certain, this should be equal to max_size;
> > + * that ensures that no run-time out-of-shared-memory failures can occur.
> > + */
> > + size_t init_size;
> >
> > Everytime I look at these two fields, I question whether those are the
> > number of entries (i.e. size of the hash table) or number of bytes
> > (size of the memory). I know it's the former, but it indicates that
> > something needs to be changed here, like changing the names to have
> > _entries instead of _size, or changing the type to int64 or some such.
> > Renaming to _entries would conflict with dynahash APIs since they use
> > _size, so maybe the latter?
>
> I hear you, but I didn't change these yet. If we go with the patches
> from the "Shared hash table allocations" thread, max_size and init_size
> will be merged into one. I'll try to settle that thread before making
> changes here.

Will review those patches next.

>
> > -void
> > -InitProcGlobal(void)
> > +static void
> > +ProcGlobalShmemInit(void *arg)
> > {
>
> I'm not sure what you meant to say here, but I did notice that there
> were a bunch of references to InitProcGlobal() left over in comments.
> Fixed those.

Oh, I just wanted to say that the new version reads much better than
the old version, which had ShmemStructInit() sprinkled at seemingly
random places. I missed writing that. Nothing serious there.

I also rebased my resizable shmem patch on v9. Attached here. I have
addressed the following open items from the list at [1]
1. The test is stable now. I found a way to make (roughly) sure that
we are not allocating more than required memory for a resizable
structure.
2. Disable the feature on platforms that do not have
MADV_POPULATE_WRITE and MADV_REMOVE. The feature is also disabled for
EXEC_BACKEND case. I have tested the EXEC_BACKEND case, but I have not
tested platforms which do not have those constants defined or on
Windows.

The first two items from [1] need some discussion still.

[1] https://www.postgresql.org/message-id/CAExHW5so6VSxBC-1V=35229Z1+dw5vhw8HxHg9ry7UzceKcXzA@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
v9-resizable_shmem_struct.patch.nocibot application/octet-stream 45.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-02 07:28:48 Re: pg_test_timing: fix unit typo and widen diff type
Previous Message Hayato Kuroda (Fujitsu) 2026-04-02 06:57:10 RE: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE