Re: Better shared data structure management and resizable shared data structures

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, chaturvedipalak1911(at)gmail(dot)com
Subject: Re: Better shared data structure management and resizable shared data structures
Date: 2026-03-16 21:56:23
Message-ID: 41137f2a-0cb0-4cbe-9afc-2dc690f60a87@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16/03/2026 12:28, Ashutosh Bapat wrote:
> 0001
>
> @@ -3236,6 +3239,8 @@ PostmasterStateMachine(void)
> LocalProcessControlFile(true);
> /* re-create shared memory and semaphores */
> + ResetShmemAllocator();
>
> This name is misleading. The function does not touch ShmemAllocator at
> all. Instead it resets the ShmemStruct registry. I suggest
> ResetShememRegistry() instead.

Hmm. Come to think of it, this isn't quite right for
shared_preload_libraries. On crash restart, we don't call _PG_init() for
shared_preload_libraries. So if ShmemRegisterStruct() is called from
_PG_init(), it won't get called again after restart, and hence we should
retain the registration. This is different from ShmemInitStruct(), which
does get called again after crash restart.

I fixed that by retaining the structs registered with
ShmemRegisterStruct(), and only clearing out entries made with
ShmemInitStruct(). I think that's the right thing to do, but I feel a
little uneasy about keeping stuff across restarts. This isn't a
completely new issue; even without this patch I feel uneasy about
keeping the shared memory segment and not calling _PG_init() for
shared_preload_libraries again. I wish crash restart could reset things
even more thoroughly.

If we introduced a shmem_register_hook like you suggested, we could call
that again after restart. But it feels silly; _PG_init() would register
a shmem_register_hook, which would register the shared memory desc. Why
not just register the shared memory desc directly? I don't think we
really gain anything by the extra step. (It might be useful to allow the
size to be calculated later, taking MaxBackends into account. What I
mean here is that it would be a silly dance just to handle crash restarts)

> + *
> + * There are two kinds of shared memory data structures: fixed-size structures
> + * and hash tables.
>
> In future we will have resizable "fixed" structures and we may also
> have resizable hash tables i.e. hash tables whose directory would be
> resizable. The later would be help support resizable shared buffers
> lookup table. It will be good to write the above sentence so that we
> can just add more types of data structures without needing to rewrite
> everything. If we could find a good term for "fixed-size structures"
> which are really "structures that require contiguous memory", we will
> be able to write the above sentence as "There are two kinds of shared
> memory structures: contiguous structures and hash tables.". When we
> add resizable structures, we can just add a sentence "A contiguous
> structure may be fixed size or resizable". When we add resizable hash
> tables, we can just replace that with "Both of these kinds can be
> fixed-size or resizable". I am not sure whether "contiguous
> structures" is a good term though (for one, word contiguous can be
> confused with continuous). Whatever term we use should be something
> that we can carry further in the remaining paragraphs.
> + Fixed-size structures contain things like global
> + * variables for a module and should never be allocated after the shared
> + * memory initialization phase.
>
> I think the existing comment is not accurate. The term "global
> variables" in the sentence can be confused with process global
> variables. We should be using the term "shared variables" or better
> "shared state". If we adopt "contiguous structures" as the term for
> the first kind of data structure, we can write "Contiguous structures
> contain shared state, maintained in a contiguous chunk of memory, for
> a module. It should never be allocated after the shared memory
> initialization phase.".

I went with "variables shared between all backend processes", and did
another pass of other copy-editing too. No doubt this needs to be
updated again when the resizing patch lands.

> - infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size);
> - infoP->alloc = ShmemAllocNoError;
> - hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE;
> + desc->hash_info.dsize = desc->hash_info.max_dsize =
> hash_select_dirsize(desc->max_size);
> + desc->hash_info.alloc = ShmemAllocNoError;
> + desc->hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE;
> /* look it up in the shmem index */
>
> The next several lines of code look up shmem index. Should we remove
> this comments or modify it to say "Register and initialize the hash
> table".

Replaced it with "/* Set up the base struct descriptor */"

> +HTAB *
> +ShmemInitHash(const char *name, /* table string name for shmem index */
> + int64 init_size, /* initial table size */
> + int64 max_size, /* max size of the table */
> + HASHCTL *infoP, /* info about key and bucket size */
> + int hash_flags) /* info about infoP */
> +{
> + ShmemHashDesc *desc;
> +
>
> ... snip ...
>
> +
> + ShmemRegisterHash(desc);
> + return *desc->ptr;
> +}
>
> I like the way these functions are written using the new API. I think
> we should keep these legacy interface at the end of section of shmem
> APIs, rather than keeping those at the end of the file where we have
> monitoring and arithmetic functions. If you want to get rid of the
> legacy APIs in this release itself, I think it's ok to keep them at
> the end of the file.

I don't plan to get rid of the legacy API any time soon, I expect
existing extensions to continue using it for years to come. So I moved
them per your suggestion.

> ShmemInitStruct() now calls ShmemRegisterStruct(). Earlier it could be
> called from any backend, in any state to fetch a pointer to a shared
> memory structure. It didn't add a new structure. Now it can add a new
> structure. I am wondering whether that can cause registry in different
> backends to get out of sync. Should we limit the window when it can be
> called just like how shmem_request_hook call is limited. In that sense
> ShmemRegisterStruct() looks something to be called from a
> shmem_register_hook which is also called from EXEC_BACKEND. Sorry to
> expand it here rather in my previous reply. In case we replace all the
> current calls to ShmemInitStruct() with ShmemRegisterStruct(), we may
> be able to get rid of the Shmem Index altogether; after all it's used
> only for fetching the pointers to the shared memory areas in
> EXEC_BACKEND mode.

I think it's still useful to allow ShmemRegisterStruct() after
postmaster startup, so that you can use it with extensions that are not
listed in shared_preload_libraries, but need a little bit of shared
memory. Hmm, perhaps we should add an explicit flag for that case,
though. So that by default ShmemRegisterStruct() fails if you call it
after postmaster startup, but you could allow it by setting a flag in
the descriptor.

> I thought that we could save the registry in the
> shared memory. In EXEC_BACKEND mode, we go over the registry calling
> attach_fn for each entry. But since the binary is overwritten in
> EXEC_BACKEND case, attach/init fns are not guaranteed to have the same
> address in all the backends. Maybe we have to resort to
> launch_backend() to transfer the registry to the backend through the
> file (to keep it in sycn in all the backends): a solution you may not
> like.
Yep, can't have function pointers in shared memory unfortunately.

> + void **ptr;
> +} ShmemStructDesc;
>
>
> I think the comments for each member should highlight which of these
> fields are required (non-zero) and which can be optional (zero'ed
> out).

Added some comments on that.

> + */
> + ShmemStructDesc base_desc;
>
> Once we have calculated the base_desc in ShmemRegisterHash() and
> called ShmemRegisterStruct(), we don't need base_desc anymore. Even
> the pointer to the allocated hash table memory is available through
> *ptr. Probably we could just remove this member from here.
> ShmemRegisterHash() can declare a variable of type ShmemStructDesc,
> populate it based on the members in this structure and pass it to
> ShmemRegisterStruct(). I am not comfortable with specification
> structure being modified by the registration function.

Hmm, the ShmemStructDesc is accessed in ShmemInitRegistered() and in the
shmem_hash_init() callback, so 'base_desc' cannot be a local variable in
ShmemRegisterHash(). It could be allocated in TopMemoryContext though.
That would hide it from ShmemHashDesc.

I'm a little ambivalent about that. It would be nice to hide it if it's
not intended to be accessed outside shmem.c. Then again, it might be
useful to peek into it in the future, depending on what fields we add.
and it might be handy to look at in a debugger.

> @@ -102,15 +102,14 @@ CalculateShmemSize(void)
> size = add_size(size, ShmemRegisteredSize());
> size = add_size(size, dsm_estimate_size());
>
> We have defined dsm_main_space_shmem_desc, but we still use
> dsm_estimate_size() here and initialize the memory in
> dsm_shmem_init(), which is explicitily called from
> CreateOrAttachShmemStructs(). Why is that? Shouldn't we be registering
> the structure in RegisterShmemStructs(), and let ShmemInitRegistered()
> initialize it? Am I missing something here?

Yes you're right. Fixed that and all the the bugs that Zsolt pointed
out. Thanks!

- Heikki

Attachment Content-Type Size
v5-0001-Test-pg_stat_statements-across-crash-restart.patch text/x-patch 1.4 KB
v5-0002-Refactor-ShmemIndex-initialization.patch text/x-patch 7.0 KB
v5-0003-Introduce-a-new-mechanism-for-registering-shared-.patch text/x-patch 52.0 KB
v5-0004-Convert-pg_stat_statements-to-use-the-new-interfa.patch text/x-patch 10.4 KB
v5-0005-Use-the-new-mechanism-in-a-few-core-subsystems.patch text/x-patch 40.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2026-03-16 22:03:04 Re: Adding REPACK [concurrently]
Previous Message Matthias van de Meent 2026-03-16 21:50:04 Re: Adding REPACK [concurrently]