| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | chaturvedipalak1911(at)gmail(dot)com |
| Subject: | Re: Better shared data structure management and resizable shared data structures |
| Date: | 2026-02-13 12:03:12 |
| Message-ID: | 5a37c2e3-619d-4816-84d7-0b27e3e6797f@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 13/02/2026 13:47, Ashutosh Bapat wrote:
> `man madvise` has this
> MADV_REMOVE (since Linux 2.6.16)
> Free up a given range of pages and its associated
> backing store. This is equivalent to punching a
> hole in the corresponding byte range of the backing
> store (see fallocate(2)). Subsequent accesses
> in the specified address range will see bytes containing zero.
>
> The specified address range must be mapped shared
> and writable. This flag cannot be applied to
> locked pages, Huge TLB pages, or VM_PFNMAP pages.
>
> In the initial implementation, only tmpfs(5) was
> supported MADV_REMOVE; but since Linux 3.5, any
> filesystem which supports the fallocate(2)
> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE.
> Hugetlbfs fails with the error EINVAL and other
> filesystems fail with the error EOPNOTSUPP.
>
> It says the flag can not be applied to Huge TLB pages. We won't be
> able to make resizable shared memory structures allocated with huge
> pages. That seems like a serious restriction.
Per https://man7.org/linux/man-pages/man2/madvise.2.html:
MADV_REMOVE (since Linux 2.6.16)
...
Support for the Huge TLB filesystem was added in Linux
v4.3.
> I may be misunderstanding something, but it seems like this is useful
> to free already allocated memory, not necessarily allocate more
> memory. I don't understand how a user would start with a larger
> reserved address space with only small portions of that space being
> backed by memory.
Hmm, I guess you'll need to use MAP_NORESERVE in the first mmap() call.
to reserve address space for the maximum size, and then
madvise(MADV_POPULATE_WRITE) using the initial size. Later,
madvise(MADV_REMOVE) to shrink, and madvise(MADV_POPULATE_WRITE) to grow
again.
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nitin Motiani | 2026-02-13 12:05:14 | Re: [PATCH] Support reading large objects with pg_read_all_data |
| Previous Message | Ashutosh Bapat | 2026-02-13 11:52:22 | Re: Changing shared_buffers without restart |