| From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, chaturvedipalak1911(at)gmail(dot)com |
| Subject: | Re: Better shared data structure management and resizable shared data structures |
| Date: | 2026-04-08 05:20:53 |
| Message-ID: | CAExHW5udO-ppz4Z6hrLPO0+ovL8byPTH3zXAPjaOy4BH3RTqyQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 8, 2026 at 1:39 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2026-04-07 22:48:17 +0300, Heikki Linnakangas wrote:
> > > +/*
> > > + * ShmemResizeStruct() --- resize a resizable shared memory structure.
> > > + *
> > > + * The new size must be within [minimum_size, maximum_size]. If the structure
> > > + * is being shrunk, the memory pages that are no longer needed are freed. If
> > > + * the structure is being expanded, the memory pages that are needed for the
> > > + * new size are allocated. See EstimateAllocatedSize() for explanation of which
> > > + * pages are allocated for a resizable structure.
> > > + */
> > > +void
> > > +ShmemResizeStruct(const char *name, Size new_size)
> >
> > This interface only allows shrinking and growing the allocated region at the
> > end, but the underlying mechanism is madvise(MADV_REMOVE) and
> > madvise(MADV_WRITE_POPULATE), which supports also "punching holes", i.e.
> > freeing memory in the middle of a region. Do we gain anything by restricting
> > ourselves to changing the size at the end? It seems to me that it could be
> > handy to punch holes for some use cases.
>
> Agreed. The hard part may be the "communication" with the user about how
> granular the punches can be. Because that will depend on things like
> huge_pages, huge_page_size and may depend on what alignment you happened to
> get.
>
We can extend it that way if there is a valid usecase. For now I kept
it simple for two reasons:
1. Buffer manager structures shrink and expand only at the end right
now. Longer note on buffer lookup table later. This effort started
with buffer resizing and didn't want to expand scope more than what's
needed.
2. Not all the approaches we tried to implement resizable shared
memory have the facility to free memory in the middle. Usually they
have a facility to shrink or expand at the end. If we offer ability to
free memory in the middle based on facilities on one platform, we will
face big hurdles when supporting other platforms. I think it's better
to avoid it when it's not needed.
Buffer lookup table is fixed. It may benefit from punching holes in
the middle if we can somehow get pages worth of free entries together
somewhere in the middle. First it's not easy to perform such
compaction. But even if implement compaction, we can collect those
entries at the end instead of in the middle; the current API will
still be useful.
Is there any other usecase you are envisioning? I also think that it
will be better to introduce a new
ShmemFreeStructPart()/ShmemAllocStructPart() instead of the current
ShmemResizeStruct().
>
> > What's the portability story? I understand that this is Linux-only at the
> > moment, but what platforms can we support in the future, and what's the
> > effort? I think BSD's have similar capabilities with plain mmap() and
> > MADV_FREE if I read the man pages right.
>
> At least linux' MADV_FREE is only for private mappings. It's not clear in at
> least freebsd's man page, but the described use case makes me suspect it may
> be similar there.
>
looks so. FreeBSD also has fallocate with PUNCH_HOLES. We could use it
with fd created using memfd_create() on .and it will need
memfd_create(). I haven't checked whether that works.
>
> > What about macOS and Windows? This doesn't necessarily need to be fully
> > portable, if some OS's don't have the capabilities we need, but would be
> > nice to know what's possible.
>
> Looks like windows has OfferVirtualMemory
> https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-offervirtualmemory
> but it's not clear to me if it actually does what we need when multiple
> processes are attached.
>
Those APIs look similar to madvise+ MADV_REMOVE/MADV_WRITE_POPULATE,
with specific and cleaner interface. At least worth a try.
> I suspect it's going to be a lot easier once we're threaded... The reason I
> am ok with doing resizing this way before threading is because it's
> architecturally pretty similar to what you'd want to do once threaded, so it's
> not a huge dead end. But I'm doubtful we'll find facilities that allow this
> across processes in all operating systems...
check
--
Best Wishes,
Ashutosh Bapat
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2026-04-08 05:24:51 | Re: Logical Replication - revisit `is_table_publication` function implementation |
| Previous Message | Peter Smith | 2026-04-08 05:10:50 | DOCS: Describe some missing parameters on CREATE/ALTER PUBLICATION pages |