Quick Links

Better shared data structure management and resizable shared data structures

From:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	chaturvedipalak1911(at)gmail(dot)com
Subject:	Better shared data structure management and resizable shared data structures
Date:	2026-02-13 11:47:11
Message-ID:	CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Heikki,
As discussed in [1], starting a new thread to discuss $Subject.

0001 in the attached patchset is same as the patch shared in [1]. For
completeness, I am copy-pasting from Heikki's email the description of
what this patch does

** quote **
Attached is a proof-of-concept of what I have in mind. Don't look too
closely at how it's implemented, it's very hacky and EXEC_BACKEND mode
is slightly broken, for example. The point is to demonstrate what the
callers would look like. I converted only a few subsystems to use the
new API, the rest still use ShmemInitStruct() and ShmemInitHash().

With this, initialization of a subsystem that defines a shared memory
area looks like this:

--------------

/* This struct lives in shared memory */
typedef struct
{
int field;
} FoobarSharedCtlData;

static void FoobarShmemInit(void *arg);

/* Descriptor for the shared memory area */
ShmemStructDesc FoobarShmemDesc = {
.name = "Foobar subsystem",
.size = sizeof(FoobarSharedCtlData),
.init_fn = FoobarShmemInit,
};

/* Pointer to the shared memory struct */
#define FoobarCtl ((FoobarSharedCtlData *) FoobarShmemDesc.ptr)

/*
* Register the shared memory struct. This is called once at
* postmaster startup, before the shared memory segment is allocated,
* and in EXEC_BACKEND mode also early at backend startup.
*
* For core subsystems, there's a list of all these functions in core
* in ipci.c, similar to all the *ShmemSize() and *ShmemInit() functions
* today. In an extension, this would be done in _PG_init() or in
* the shmem_request_hook, replacing the RequestAddinShmemSpace calls
* we have today.
*/
void
FoobarShmemRegister(void)
{
ShmemRegisterStruct(&FoobarShmemDesc);
}

/*
* This callback is called once at postmaster startup, to initialize
* the shared memory struct. FoobarShmemDesc.ptr has already been
* set when this is called.
*/
static void
FoobarShmemInit(void *arg)
{
memset(FoobarCtl, 0, sizeof(FoobarSharedCtlData));
FoobarCtl->field = 123;
}

--------------

The ShmemStructDesc provides room for extending the facility in the
future. For example, you could specify alignment there, or an additional
"attach" callback when you need to do more per-backend initialization in
EXEC_BACKEND mode. And with the resizeable shared memory, a max size.

** unquote **

0002 allows pointers of the global variables pointing to the shared
memory structure to be specified in ShmemStructDesc for easier use.
This should be merged into 0001.

0003 allows resizable shared memory structures to be specified via
ShmemRegisterStruct() and then implements allocating shared memory
segments for them and allocating the structures themselves. It also
implements the ShmemResizeRegistered() API to resize registered
resizable structures. The resizable shared memory structures are
placed in their own shared memory segments which are implemented using
the same method as 0002 patch in [2]. It is also PoC, "Do not do not
look too closely". The pieces dealing with huge pages need some
rework. Portability is another issue. Most important is what method
should be used to implement resizable shared memory itself. More on
that later.

0003 adds APIs to register, allocate and resize shared memory
structures in shmem.c extending the infrastructure added by 0001. The
patch also has a test which demonstrates how to use those APIs. If we
think those APIs look good, we can work on finishing 0001 and then I
can work on completing 0003.

Thoughts?

I am copying the discussion about supporting resizable shared memory
from shared buffers resizing thread here, since those apply to 0003.
Andres is suggesting an alternate approach [3] to support resizable
shared memory. I am continuing that conversation here.

> I think the multiple memory mappings approach is just too restrictive. If we
> e.g. eventually want to make some of the other major allocations that depend
> on NBuffers react to resizing shared buffers, it's very easy to do if all it
> requires is calling
> madvise(TYPEALIGN(start, page_size), MADV_REMOVE, TYPEALIGN_DOWN(end, page_size));

You mean madvise(TYPEALIGN(start, page_size), TYPEALIGN_DOWN(end,
page_size) - TYPEALIGN(start, page_size), MADV_REMOVE)? Right?

`man madvise` has this
MADV_REMOVE (since Linux 2.6.16)
Free up a given range of pages and its associated
backing store. This is equivalent to punching a
hole in the corresponding byte range of the backing
store (see fallocate(2)). Subsequent accesses
in the specified address range will see bytes containing zero.

The specified address range must be mapped shared
and writable. This flag cannot be applied to
locked pages, Huge TLB pages, or VM_PFNMAP pages.

In the initial implementation, only tmpfs(5) was
supported MADV_REMOVE; but since Linux 3.5, any
filesystem which supports the fallocate(2)
FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE.
Hugetlbfs fails with the error EINVAL and other
filesystems fail with the error EOPNOTSUPP.

It says the flag can not be applied to Huge TLB pages. We won't be
able to make resizable shared memory structures allocated with huge
pages. That seems like a serious restriction.

I may be misunderstanding something, but it seems like this is useful
to free already allocated memory, not necessarily allocate more
memory. I don't understand how a user would start with a larger
reserved address space with only small portions of that space being
backed by memory.

>
> There are several cases that are pretty easy to handle that way:
> - Buffer Blocks
> - Buffer Descriptors
> - Sync request queue (part of the "Checkpointer Data" allocation)
> - Checkpoint BufferIds (for sorting the to-be-checkpointed data)
> - Buffer IO Condition Variables
>
> But if you want to support making these resizable with the separate mappings
> approach, it gets considerably more complicated and the number of mappings
> increases more substantially.
>
> We also don't need a lot less infrastructure in shmem.c that way. We could
> e.g. make ShmemInitStruct() reservere the entire requested size (to avoid OOM
> killer issues) and have a ShmemInitStructExt() that allows the caller choose
> whether to reserve. No different segment IDs etc are needed.

I agree that if we can devise a mechanism to allocate a single mapping
with holes placed around resizable structure, we could use it for
shared memory structures other than buffer pool as well. However, as
far as I can understand we will still need the concept of segments
inside shmem.c (not necessarily in pg_shmem.h) to track the
allocations for each of the individual structures OR may be we could
use the resizable shmem structure itself to track it.

[1] https://www.postgresql.org/message-id/91265854-b3ba-45c6-aa44-7e8dcdd51470%40iki.fi
[2] https://www.postgresql.org/message-id/CAExHW5tSw8r06RLAArvf923cO4NGetitPhQ7AO0o7hsKx8jsNw%40mail.gmail.com
[3] https://www.postgresql.org/message-id/aY4v1oSmokXNpQMX%40alap3.anarazel.de

--
Best Wishes,
Ashutosh Bapat

Attachment	Content-Type	Size
0002-Get-rid-of-global-shared-memory-pointer-mac-20260213.patch	text/x-patch	15.0 KB
0001-wip-Introduce-a-new-way-of-registering-shar-20260213.patch	text/x-patch	53.8 KB
0003-WIP-Resizable-shared-memory-structures-20260213.patch	text/x-patch	106.7 KB

Responses

Re: Better shared data structure management and resizable shared data structures at 2026-02-13 12:03:12 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2026-02-13 11:49:31	Re: Changing shared_buffers without restart
Previous Message	Tatsuo Ishii	2026-02-13 11:46:50	Re: Row pattern recognition