Quick Links

Re: Better shared data structure management and resizable shared data structures

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, chaturvedipalak1911(at)gmail(dot)com
Subject:	Re: Better shared data structure management and resizable shared data structures
Date:	2026-03-19 10:31:10
Message-ID:	8a6799be-bd42-49fb-8914-856c97bb1977@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 18/03/2026 21:30, Robert Haas wrote:
> On Fri, Mar 13, 2026 at 7:41 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Yeah. IMHO the existing shmem_request/startup_hook mechanism is pretty
>> awkward too, and in most cases, the new mechanism is more convenient. It
>> might be slightly less convenient for some things, but mostly it's
>> better. Would you agree with that, or do you actually like the old hooks
>> and ShmemInitStruct() better?
>
> I'd say it's not massively different one way or the other. Looking at
> the pg_stat_statements changes in particular, I feel like in v2, it
> was actually worse, because you didn't get manage to get rid of
> pgss_shmem_request(), but it was no longer the thing requesting shmem.
> v3 is better in that regard: you get rid of some complexity in
> exchange for what you add. It's not amazing, though:
> pgss_shmem_startup() now has nothing to do with the name of the hook,
> which is not this patch's fault but also isn't solved by it. I wonder
> why in the world somebody decided to jam all of this non-shmem-related
> logic into this function, and why they didn't add a comment explaining
> why it was here and not someplace else. One of the worst things about
> this area is that I often end up having to trace through a bunch of
> postmaster startup logic to figure out whether any given bit of code
> is actually in the right place, and that makes me feel like the hooks
> are badly designed. pgss_shmem_startup() is a good example of that,
> and the fact that it needs an IsUnderPostmaster check is another.

Hmm, I assumed it was important that the pg_stat_statements file is
loaded later in the startup sequence, and that's why it was in
pgss_shmem_startup(). But now that I look at it, I don't think there was
any grand plan, shmem_startup_hook was just the only easy way to get
control during postmaster startup, after shmem initialization.

So I think we can move that code to the init_fn callback with the new
API, and that gets rid of shmem_startup_hook in pg_stat_statements. See
attached (v6-0005-move-pgss-shmem_startup-hook-code-into-the-new-in.patch).

> We should somehow try to make it clear what happens with this new
> mechanism if a module is loaded via shared_preload_libraries vs. if
> it's loaded via LOAD or session_preload_libraries or whatever. Writing
> modules that don't require shared_preload_libraries is a boon to
> users, because they can be added to a production system without a
> server restart. I wonder whether this new facility handles that case
> better, worse, or the same as existing facilities.

Pretty much the same I think. Point taken that it could be documented
better.

The old documentation for ShmemInitStruct() assumed that it would be
used from shared_preload_libraries, it didn't. With the new API, I tried
to document how it behaves when used outside shared_preload_libraries,
but still focused on how using it from shared_preload_libraries. I'm not
sure it helped. Perhaps the after-startup behavior should be put in a
separate section.

>> One such wrinkle with ShmemRegisterStruct() in the patch now is that
>> it's harder to do initialization that touches multiple structs or hash
>> tables. Currently the callbacks are called in the same order that the
>> structs are registered, so you can do all the initialization in the last
>> struct's callback. The single pair of shmem_request/startup_hooks per
>> module was more clear in that aspect. Fortunately, that kind of
>> cross-struct dependencies are pretty rare. So I think it's fine. (The
>> order that the callbacks are called needs be documented explicitly though).
>>
>> If we want to improve on that, one idea would be to introduce a
>> ShmemRegisterCallbacks() function to register callbacks that are not
>> tied to any particular struct and are called after all the per-struct
>> callbacks.
>
> I think the whole idea of ShmemInitHash() and ShmemInitStruct() is
> relatively poorly designed in this regard. Ideally, I want to
> initialize all the shared memory that belongs to me, and that might
> include arbitrary data structures, but the current design kind of
> thinks that I want one struct or one hash table and nothing else. If
> we're redesigning this mechanism, it would sure be nice to improve on
> that.
>
> Looking just at the pg_stat_statements changes, I think my overall
> view on this right now is that it's not terrible, but I'm also not
> that happy about introducing yet another way to do it for this amount
> of gain. To me, it doesn't yet rise to the level of a clear win.

So here's another idea (not yet implemented in the attached patch):
instead of thinking in terms of individual shmem structs and hashes,
let's introduce a concept of a "subsystem" that ties them together:

static pgss_subsystem_desc = {
.name = "pg_stat_statements",
.shmem_structs = {
{
.ptr = (void **) &pgss,
.size = sizeof(pgssSharedState),
},
},
.shmem_hashes = {
{
.ptr = &pgss_hash,

.init_size = 0, /* set from 'pgss_max' */
.max_size = 0, /* set from 'pgss_max' */
.hash_info.keysize = sizeof(pgssHashKey),
.hash_info.entrysize = sizeof(pgssEntry),
.hash_flags = HASH_ELEM | HASH_BLOBS,
},
},

/* called after the shmem structs and hashes have been allocated */
.init_fn = pgss_shmem_init,
}

void
_PG_init(void)
{
...
RegisterSubsystem(&pgss_subsystem_desc);
}

We could add more callbacks that get called at different times. For
example the callback that would get called before shared memory is
allocated, which could adjust the size according to MaxBackends. That
would fully replace shmem_request_hook. Or a callback that would get
called later in the startup sequence, if we wanted to e.g. load the
pg_stat_statements file later in startup.

This would be a natural place for other resources in future too. We
could support declaring "named lwlock tranches" here to replace
RequestNamedLWLockTranche() for example, although I think it's still
better to encourage embedding the LWLock in the struct instead.

_PG_init in pg_stat_statements still does a lot more than register that
struct. It declares the GUCs and installs other hooks for example. We
could perhaps move those to the subsystem descriptor too, although I'm
not sure if that's worth the code churn.

- Heikki

Attachment	Content-Type	Size
v6-0001-Test-pg_stat_statements-across-crash-restart.patch	text/x-patch	1.4 KB
v6-0002-Refactor-ShmemIndex-initialization.patch	text/x-patch	7.0 KB
v6-0003-Introduce-a-new-mechanism-for-registering-shared-.patch	text/x-patch	52.1 KB
v6-0004-Convert-pg_stat_statements-to-use-the-new-interfa.patch	text/x-patch	10.4 KB
v6-0005-move-pgss-shmem_startup-hook-code-into-the-new-in.patch	text/x-patch	4.0 KB
v6-0006-Use-the-new-mechanism-in-a-few-core-subsystems.patch	text/x-patch	40.8 KB

In response to

Re: Better shared data structure management and resizable shared data structures at 2026-03-18 19:30:10 from Robert Haas

Responses

Re: Better shared data structure management and resizable shared data structures at 2026-03-19 13:36:36 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Gustafsson	2026-03-19 10:38:22	Re: Serverside SNI support in libpq
Previous Message	Amul Sul	2026-03-19 10:20:59	Re: pg_waldump: support decoding of WAL inside tarfile