Re: munmap() failure due to sloppy handling of hugepage size

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Subject: Re: munmap() failure due to sloppy handling of hugepage size
Date: 2016-10-12 22:14:20
Message-ID: CAHyXU0wB7oT58jSzYniy7df7bwQauyt=TVNKvsHvu1eRPSaMDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 12, 2016 at 5:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> Tom Lane wrote:
>>> According to
>>> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>>> looking into /proc/meminfo is the longer-standing API and thus is
>>> likely to work on more kernel versions. Also, if you look into
>>> /sys then you are going to see multiple possible values and it's
>>> not clear how to choose the right one.
>
>> I'm not sure that this is the best rationale. In my system there are
>> 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8
>> GB of shared memory is requested) it seems a clear winner to allocate 8
>> 1GB hugepages than 4096 2MB hugepages because the page table is so much
>> smaller. The /proc interface only shows the 2MB page size, so if we go
>> that route we'd not be getting the full benefit of the feature.
>
> And you'll tell mmap() which one to do how exactly? I haven't found
> anything explaining how applications get to choose which page size applies
> to their request. The kernel document says that /proc/meminfo reflects
> the "default" size, and I'd assume that that's what we'll get from mmap.

hm. for (recent) linux, I see:

MAP_HUGE_2MB, MAP_HUGE_1GB (since Linux 3.8)
Used in conjunction with MAP_HUGETLB to select alternative
hugetlb page sizes (respectively, 2 MB and 1 GB) on systems
that support multiple hugetlb page sizes.

More generally, the desired huge page size can be configured
by encoding the base-2 logarithm of the desired page size in
the six bits at the offset MAP_HUGE_SHIFT. (A value of zero
in this bit field provides the default huge page size; the
default huge page size can be discovered vie the Hugepagesize
field exposed by /proc/meminfo.) Thus, the above two
constants are defined as:

#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
#define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT)

The range of huge page sizes that are supported by the system
can be discovered by listing the subdirectories in
/sys/kernel/mm/hugepages.

via: http://man7.org/linux/man-pages/man2/mmap.2.html#NOTES

ISTM all this silliness is pretty much unique to linux anyways.
Instead of reading the filesystem, what about doing test map and test
unmap? We could zero in on the page size for default I think with
some probing of known possible values.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-10-12 22:18:56 Re: munmap() failure due to sloppy handling of hugepage size
Previous Message Tom Lane 2016-10-12 22:10:05 Re: munmap() failure due to sloppy handling of hugepage size