| From: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> | 
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> | 
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Subject: | Re: Changing shared_buffers without restart | 
| Date: | 2025-04-21 09:29:59 | 
| Message-ID: | kwvbsat7reeopjhwopfypwdhsfrcev7nmmltec5ec6zzkjol5o@itpqaokfw7kb | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
> On Fri, Apr 18, 2025 at 09:17:21PM GMT, Thomas Munro wrote:
> I was imagining that you might map some maximum possible size at the
> beginning to reserve the address space permanently, and then adjust
> the virtual memory object's size with ftruncate as required to provide
> backing.  Doesn't that achieve the goal with fewer steps, using only
> portable* POSIX stuff, and keeping all pointers stable?
Ah, I see what you folks mean. So in the latest patch there is a single large
shared memory area reserved with PROT_NONE + MAP_NORESERVE. This area is
logically divided between shmem segments, and each segment is mmap'd out of it
and could be resized withing these logical boundaries. Now the suggestion is to
have one reserved area for each segment, and instead of really mmap'ing
something out of it, manage memory via ftruncate.
Yeah, that would work and will allow to avoid MAP_FIXED and mremap, which are
questionable from portability point of view. This leaves memfd_create, and I'm
still not completely clear on it's portability -- it seems to be specific to
Linux, but others provide compatible implementation as well.
Let me experiment with this idea a bit, I would like to make sure there are no
other limitations we might face.
> I understand that pointer stability may not be required
Just to clarify, the current patch maintains this property (stable pointers),
which I also see as mandatory for any possible implementation.
> *You might also want to use fallocate after ftruncate on Linux to
> avoid SIGBUS on allocation failure on first touch page fault, which
> raises portability questions since it's unspecified whether you can do
> that with shm fds and fails on some systems, but it let's call that an
> independent topic as it's not affected by this choice.
I'm afraid it would be strictly neccessary to do fallocate, otherwise we're
back where we were before reservation accounting for huge pages in Linux (lot's
of people were facing unexpected SIGBUS when dealing with cgroups).
> TIL that mmap(size, fd) will actually extend a hugetlb memfd as a side
> effect on Linux, as if you had called ftruncate on it (fully allocated
> huge pages I expected up to the object's size, just not magical size
> changes beyond that when I merely asked to map it).  That doesn't
> happen for regular page size, or for any page size on my local OS's
> shm objects and doesn't seem to fit mmap's job description given an
> fd*, but maybe I'm just confused.  Anyway, a  workaround seems to be
> to start out with PROT_NONE and MAP_NORESERVE, then mprotect(PROT_READ
> | PROT_WRITE) new regions after extending with ftruncate(), at least
> in simple tests...
Right, it's similar to the currently implemented space reservation, which also
goes with PROT_NONE and MAP_NORESERVE. I assume it boils down to the way how
memory reservation accounting in Linux works.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dmitry Dolgov | 2025-04-21 09:33:02 | Re: Changing shared_buffers without restart | 
| Previous Message | jian he | 2025-04-21 09:11:51 | Re: bug: virtual generated column can be partition key |