Quick Links

Re: Better shared data structure management and resizable shared data structures

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, chaturvedipalak1911(at)gmail(dot)com
Subject:	Re: Better shared data structure management and resizable shared data structures
Date:	2026-04-01 18:17:12
Message-ID:	9d919bd9-94dd-4bda-8ccf-ebced4178c53@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Yet another version attached (also available at:
https://github.com/hlinnaka/postgres/tree/shmem-init-refactor-9) The
main change is the shape of the ShmemRequest*() calls:

On 27/03/2026 02:51, Heikki Linnakangas wrote:
> Another idea is to use a macro to hide that from pgindent, which would
> make the calls little less verbose anyway:
>
> #define ShmemRequestStruct(desc, ...) ShmemRequestStructWithOpts(desc,
> &(ShmemRequestStructOpts) { __VA_ARGS__ })
>
> Then the call would be simply:
>
> ShmemRequestStruct(&pgssSharedStateDesc,
> .name = "pg_stat_statements",
> .size = sizeof(pgssSharedState),
> .ptr = (void **) &pgss,
> );

I went with that approach. We're already doing something similar with
XL_ROUTINE in xlogreader.h:

#define XL_ROUTINE(...) &(XLogReaderRoutine){__VA_ARGS__}

The calls look like this:

xlogreader =
XLogReaderAllocate(wal_segment_size, NULL,
XL_ROUTINE(.page_read = &XLogPageRead,
.segment_open = NULL,
.segment_close = wal_segment_close),
private);

If we followed that example, ShmemRequestStruct() calls would look like
this:

ShmemRequestStruct(&pgssSharedStateDesc,
SHMEM_STRUCT_OPTS(.name = "pg_stat_statements",
.size = sizeof(pgssSharedState),
.ptr = (void **) &pgss,
);

However, I don't like the deep indentation, it feels like the important
stuff is buried to the right. And pgindent insists on that. So I went
with the proposal I quoted above, turning ShmemRequestStruct(...) itself
into a macro. If you need more complex options setup, you can set up the
struct without the macro and call ShmemRequestStructWithOpts() directly,
but so far all of the callers can use the macro.

Ashutosh, I think I've addressed most of your comments so far. I'm
replying to just a few of them here that might need more discussion:

>
> +} shmem_startup_state;
>
> This isn't just startup state since the backend can toggle between
> DONE and LATE_ATTACH_OR_INIT states after the startup. Probably
> "shmem_state" would be a better name.

Renamed to "shmem_request_state". And renamed "LATE_ATTACH_OR_INIT" to
"AFTER_STARTUP_ATTACH_OR_INIT" to match the terminology I used elsewhere.

I'm still not entirely happy with this state machine. It seems useful to
have it for sanity checking, but it still feels a little unclear what
state you're in at different points in the code, and as an aesthetic
thing, the whole enum feels too prominent given that it's just for
sanity checks.

> + ShmemStructDesc *desc = area->desc;
> +
> + AttachOrInit(desc, false, true);
> + }
> + list_free(requested_shmem_areas);
> + requested_shmem_areas = NIL;
>
> If we pop all the nodes from the list, then the list should be NIL
> right? Why do we need to free it?
>
> + else if (!init_allowed)
> + {
>
> For the sake of documentation and sanity, I would add
> Assert(!index_entry) here, possibly with a comment. Otherwise it feels
> like we might be leaving a half-initialized entry in the hash table.
>
> What if attach_allowed is false and the entry is not found? Should we
> throw an error in that case too? It would be foolish to call
> AttachOrInit with both init_allowed and attach_allowed set to false,
> but the API allows it and we should check for that.
>
> It feels like we should do something about the arguments. The function
> is hard to read. init_allowed is actually the action the caller wants
> to take if the entry is not found, and attach_allowed is the action
> the caller wants to take if the entry is found.
>
> Also explain in the comment what does attach mean here especially in
> case of fixed sized structures.

I renamed it to AttachOrInitShmemIndexEntry, and the args to 'may_init'
and 'may_attach'. But more importantly I added comments to explain the
different usages. Hope that helps..

On 01/04/2026 14:59, Ashutosh Bapat wrote:
> 0008
> ------
> - LWLockRelease(AddinShmemInitLock);
> + /* The hash table must be initialized already */
> + Assert(pgss_hash != NULL);
>
> Does it make sense to also Assert(pgss)? A broader question is do we
> want to make it a pattern that every user of ShmemRequest*() also
> Assert()s that the pointer is non-NULL in the init callback? It is a
> test that the ShmemRequest*(), which is far from, init_fn is working
> correctly.

The function does a lot of accesses of 'pgss' so if that's NULL you'll
get a crash pretty quickly. I'm not sure if the Assert(pgss_hash !=
NULL) is really needed either, but I'm inclined to keep it, as pgss_hash
might not otherwise be accessed in the function, and there are runtime
checks for it in the other functions, so if it's not initialized for
some reason, things might still appear to work to some extent. I don't
think I want to have that as a broader pattern though.

> + /*
> + * Extra space to reserve in the shared memory segment, but it's not part
> + * of the struct itself. This is used for shared memory hash tables that
> + * can grow beyond the initial size when more buckets are allocated.
> + */
> + size_t extra_size;
>
> When we introduce resizable structures (where even the hash table
> directly itself could be resizable), we will introduce a new field
> max_size which is easy to get confused with extra_size. Maybe we can
> rename extra_size to something like "auxilliary_size" to mean size of
> the auxiliary parts of the structure which are not part of the main
> struct itself.
>
> + /*
> + * max_size is the estimated maximum number of hashtable entries. This is
> + * not a hard limit, but the access efficiency will degrade if it is
> + * exceeded substantially (since it's used to compute directory size and
> + * the hash table buckets will get overfull).
> + */
> + size_t max_size;
> +
> + /*
> + * init_size is the number of hashtable entries to preallocate. For a
> + * table whose maximum size is certain, this should be equal to max_size;
> + * that ensures that no run-time out-of-shared-memory failures can occur.
> + */
> + size_t init_size;
>
> Everytime I look at these two fields, I question whether those are the
> number of entries (i.e. size of the hash table) or number of bytes
> (size of the memory). I know it's the former, but it indicates that
> something needs to be changed here, like changing the names to have
> _entries instead of _size, or changing the type to int64 or some such.
> Renaming to _entries would conflict with dynahash APIs since they use
> _size, so maybe the latter?

I hear you, but I didn't change these yet. If we go with the patches
from the "Shared hash table allocations" thread, max_size and init_size
will be merged into one. I'll try to settle that thread before making
changes here.

> /*
> - * If we're in the postmaster (or a standalone backend...), set up a shmem
> - * exit hook to dump the statistics to disk.
> + * Set up a shmem exit hook to dump the statistics to disk on postmaster
> + * (or standalone backend) exit.
> */
> - if (!IsUnderPostmaster)
> - on_shmem_exit(pgss_shmem_shutdown, (Datum) 0);
> -
> - /*
> - * Done if some other process already completed our initialization.
> - */
> - if (found)
> - return;
> + on_shmem_exit(pgss_shmem_shutdown, (Datum) 0);
> Given that the structures are registered only at the startup, this
> function will be called only from Postmaster, but given that the
> structures can be registered and initialized after startup in any
> backend, it's better to at least Assert(!IsUnderPostmaster) at the
> beginning of this function. The code below is not expected to be
> called in any backend too. So Assert(IsUnderPostmaster) at the
> beginning of the function would be good safety catch too.

Ok, added an assertion.

> /*
> + * Load any pre-existing statistics from file.
> + *
> * Note: we don't bother with locks here, because there should be no other
> * processes running when this code is reached.
> */
>
> I was a bit worried that the code next to read stat files is being
> crammed in init_fn, but given that the contents of the files are used
> to initialize the shared hash table, I think this is fine.

Yeah, I went through that train of thought too. Loading the file into
the hash table is a kind of initialization.

> 0009
> -------
> +void
> +RegisterBuiltinShmemCallbacks(void)
> +{
> + const ShmemCallbacks *builtin_subsystems[] = {
> +#define PG_SHMEM_SUBSYSTEM(subsystem_callbacks) &subsystem_callbacks,
> +#include "storage/subsystemlist.h"
> +#undef PG_SHMEM_SUBSYSTEM
> + };
> +
> + for (int i = 0; i < lengthof(builtin_subsystems); i++)
> + RegisterShmemCallbacks(builtin_subsystems[i]);
> +}
> +
>
> I don't think we need to use a separate array here, we can just call
> RegisterShmemCallbacks() directly in the macro as attached.

Ah, clever.

> 0011
> ------
> + InjectionPointAttach("aio-process-completion-before-shared",
> + "test_aio",
> + "inj_io_short_read",
> + NULL,
> + 0);
> + InjectionPointLoad("aio-process-completion-before-shared");
> +
> + InjectionPointAttach("aio-worker-after-reopen",
> + "test_aio",
> + "inj_io_reopen",
> + NULL,
> + 0);
> + InjectionPointLoad("aio-worker-after-reopen");
>
> Attaching and loading an injection point shouldn't be part of the
> shared memory initialization. It doens't feel like it should be part
> of shmem_startup_hook as well. So not a fault of this patch. I am
> wondering why can't it be done in the tests themselves?

I think it's the same reason that's explained in the comment in
test_aio_shmem_attach():

> /*
> * Pre-load the injection points now, so we can call them in a critical
> * section.
> */
> #ifdef USE_INJECTION_POINTS
> InjectionPointLoad("aio-process-completion-before-shared");
> InjectionPointLoad("aio-worker-after-reopen");
> elog(LOG, "injection point loaded");
> #endif

> -void
> -InitProcGlobal(void)
> +static void
> +ProcGlobalShmemInit(void *arg)
> {

I'm not sure what you meant to say here, but I did notice that there
were a bunch of references to InitProcGlobal() left over in comments.
Fixed those.

- Heikki

Attachment	Content-Type	Size
v9-0001-Test-pg_stat_statements-across-crash-restart.patch	text/x-patch	2.6 KB
v9-0002-refactor-Move-ShmemInitHash-to-separate-file.patch	text/x-patch	9.1 KB
v9-0003-refactor-predicate.c-inline-SerialInit-to-the-cal.patch	text/x-patch	3.6 KB
v9-0004-refactor-predicate.c-Move-all-the-initialization-.patch	text/x-patch	8.3 KB
v9-0005-Introduce-a-new-mechanism-for-registering-shared-.patch	text/x-patch	60.1 KB
v9-0006-Add-test-module-to-test-after-startup-shmem-alloc.patch	text/x-patch	10.2 KB
v9-0007-Convert-pg_stat_statements-to-use-the-new-interfa.patch	text/x-patch	11.4 KB
v9-0008-Introduce-registry-of-built-in-subsystems.patch	text/x-patch	6.6 KB
v9-0009-Convert-injection-points-to-use-the-new-interface.patch	text/x-patch	4.6 KB
v9-0010-Convert-wait_event.c-to-new-interface.patch	text/x-patch	6.3 KB
v9-0011-Convert-test_aio-to-use-the-new-mechanism.patch	text/x-patch	4.9 KB
v9-0012-Use-the-new-mechanism-in-a-few-core-subsystems.patch	text/x-patch	39.9 KB
v9-0013-Convert-SLRUs-to-use-the-new-interface.patch	text/x-patch	85.0 KB
v9-0014-Convert-AIO-to-the-new-interface.patch	text/x-patch	32.1 KB
v9-0015-Add-option-for-aligning-shmem-allocations.patch	text/x-patch	3.9 KB
v9-0016-Convert-all-remaining-subsystems-to-use-the-new-A.patch	text/x-patch	115.3 KB

In response to

Re: Better shared data structure management and resizable shared data structures at 2026-04-01 11:59:13 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2026-04-01 18:25:40	Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message	Tom Lane	2026-04-01 18:02:58	Re: scale parallel_tuple_cost by tuple width