Quick Links

Re: Better shared data structure management and resizable shared data structures

From:	Haoyu Huang <haoyu(dot)huang(dot)68(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, chaturvedipalak1911(at)gmail(dot)com
Subject:	Re: Better shared data structure management and resizable shared data structures
Date:	2026-06-12 15:51:16
Message-ID:	CAM1e6U5XDwKYZo6Jj3yD3xpCB4qkhRSQn8upauHt=WhEbK9VZA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi all,

Wanted to introduce myself on this thread and share some related work —
with no intention of forking or redirecting what Ashutosh is driving here.

It was great catching up with Ashutosh, David Wein, and Heikki at PGConf
Vancouver. We had a working session on resizable shared buffers. It was
productive and a lot of fun. The outcome of the session is to surface our
work at Databricks on the same topic here.

At Databricks, we have a patch merged in our internal Postgres that enables
resizing shared_buffers without restart. It was inspired by Ashutosh's
earlier patch on this topic. Heikki and David reviewed it on our side. I'd
like to contribute the ideas (and, where useful, the code) back upstream.
I want to be explicit that the current series is the path forward. I'd much
rather plug into that than propose a competing patchset. Happy to help with
review, testing, or specific pieces wherever it's most useful.

Please read the patch as input to the existing effort, not a
counter-proposal. Thanks for all the work on this so far.

Here are the major changes we made on top of Ashutosh's earlier patch

1. Keep one mmap anonymous segment + madvise(MADV_POPULATE_WRITE /
MADV_REMOVE) to allocate/free physical pages.
2. SB variable names changed to use lowNBuffers, highNBuffers, and
maxNBuffers. See the README for more details. We think that this simplifies
the code significantly.
3. Only the shrink needs to use proc signal barrier to coordinate with
all other backends. The other cases are covered by a new
AccessNBuffersLock. The coordinator acquires exclusive lock on
AccessNBuffersLock when it publishes new buffers. Other backends acquire
the shared lock on AccessNBuffersLock when they loop through the buffer
array based on the NBuffers value.
4. The API to resize the shared buffer is `SELECT
pg_resize_shared_buffers('new_size')`.

Thanks,
Haoyu

On Fri, Jun 12, 2026 at 8:37 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 21/04/2026 19:05, Ashutosh Bapat wrote:
> > On Tue, Apr 21, 2026 at 1:10 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
> >>
> >> On 07/04/2026 17:19, Ashutosh Bapat wrote:
> >>> Hi Heikki,
> >>> CallShmemCallbacksAfterStartup() holds ShmemIndexLock while invoking
> >>> init_fn/attach_fn callbacks. That looks wrong. Before this commit,
> >>> init or attach code was not run with the lock held. Any reason the
> >>> lock is held while calling init and attach callbacks. Since these
> >>> function can come from extensions, we don't have control on what goes
> >>> in those functions, and thus looks problematic. Further, it will
> >>> serialize all the attach_fn executions across backends, since each
> >>> will be run under the lock.
> >>
> >> This was intentional, I added a note in the docs about it:
> >>
> >> When <function>RegisterShmemCallbacks()</function> is called
> after
> >> startup, it will immediately call the appropriate callbacks,
> >> depending
> >> on whether the requested memory areas were already initialized
> by
> >> another backend. The callbacks will be called while holding an
> >> internal
> >> lock, which prevents concurrent two backends from initializing
> the
> >> memory area concurrently.
> >>
> >> That "internal lock" is ShmemIndexLock. I piggybacked on that since the
> >> code needs to acquire it anyway for the hash table lookups.
> >
> > I had read this part, but didn't realize it's ShmemIndexLock. The
> > document and the code are placed far apart and the comments in the
> > code do not help connecting these two. The comment before
> > LWLockAcquire() call doesn't say anything about init functions.
> > /* Hold ShmemIndexLock while we allocate all the shmem entries */
> >
> >> With the old ShmemInitStruct() interface, extensions needed to do the
> >> locking themselves, usually by holding AddinShmemInitLock.
> >>
> >> (Now that I read that again, the grammar on the last sentence sounds
> >> awkward...)
> >
> > Given that the init_fn is called in only one backend which requests
> > the structures first, do we need a lock?
>
> If two backends request the same structure concurrently, which one is
> "first"? That's what the lock determines.
>
> It's not safe to release the lock before the init callback has finished.
> Otherwise, another backend might attach to the struct before it's fully
> initialized and read uninitialized values.
>
> >>> In my case, the init_fn was performing ShmemIndex lookup which
> >>> deadlocked. It's questionable whether init function should lookup
> >>> ShmemIndex but, it's not something that needs to be prohibited
> >>> either.
> >> Yeah I'm curious what the use case is. We could easily introduce another
> >> lock or reuse AddinShmemInitLock for this.
> >
> > In case of resizable shared memory structures, I was adding mprotect
> > to make sure that the part of the shared address space which is
> > reserved but not used is protected from inadvertent access. The
> > mprotect is wrapped in a shmem API which fetches the ShmemIndex entry
> > of the shared structure, figures out the part of the address space to
> > protect using maximum_size and current size and calls mprotect
> > appropriately. To fetch the ShmemIndex entry it acquires a ShmemIndex
> > lock. The shmem API was supposed to be called from init_fn() and
> > attach_fn() to protect the address spaces as soon as the structure is
> > attached to. See patches attached to [1] for code.
> >
> > [1]
> https://www.postgresql.org/message-id/CAExHW5v5muT_SKV2NCxxVmvC=_38Rw0aiv-wU4CGzHaBCRYzqA@mail.gmail.com
>
> Ok. So if I understand correctly, holding ShmemIndexLock is not a actual
> problem per se, you just didn't expect it. Right?
>
> I propose the attached to improve the wording a little on the docs,
> comments, and error message.
>
> - Heikki
>

Attachment	Content-Type	Size
0001-Online-resize-of-the-shared-buffer-pool.patch	application/octet-stream	101.7 KB
README-dsb.md	text/markdown	7.9 KB

In response to

Re: Better shared data structure management and resizable shared data structures at 2026-06-12 15:37:21 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Previous Message	Heikki Linnakangas	2026-06-12 15:37:21	Re: Better shared data structure management and resizable shared data structures