Re: munmap() failure due to sloppy handling of hugepage size

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: munmap() failure due to sloppy handling of hugepage size
Date: 2016-10-12 20:28:38
Message-ID: 25C01331-B1AE-45A6-BD1F-D8AE2DE40F86@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On October 12, 2016 1:25:54 PM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>If any of you were following the thread at
>https://www.postgresql.org/message-id/flat/CAOan6TnQeSGcu_627NXQ2Z%2BWyhUzBjhERBm5RN9D0QFWmk7PoQ%40mail.gmail.com
>I spent quite a bit of time following a bogus theory, but the problem
>turns out to be very simple: on Linux, munmap() is pickier than mmap()
>about the length of a hugepage allocation. The comments in
>sysv_shmem.c
>mention that on older kernels mmap() with MAP_HUGETLB will fail if
>given
>a length request that's not a multiple of the hugepage size. Well, the
>behavior they replaced that with is little better: mmap() succeeds, but
>it gives you back a region that's been silently enlarged to the next
>hugepage boundary, and then munmap() will fail if you specify the
>region
>size you asked for rather than the region size you were given.
>
>Since AFAICS there is no way to inquire what region size you were
>given,
>this API is astonishingly brain-dead IMO. But that seems to be what
>we've got. Chris Richards reported it against a 3.16.7 kernel, and
>I can replicate the behavior on RHEL6 (2.6.32) by asking for an
>odd-size
>huge page region.
>
>We've mostly masked this by rounding up to a 2MB boundary, which is
>what
>the hugepage size typically is. But that assumption is wrong on some
>hardware, and it's not likely to get less wrong as time passes.
>
>A little bit of research suggests that on Linux the thing to do would
>be
>to get the actual default hugepage size by reading /proc/meminfo and
>looking for a line like "Hugepagesize: 2048 kB". I don't know
>of any more-portable API, so this does nothing for non-Linux kernels.
>But we have not heard of similar misbehavior on other platforms, even
>though IA64 and PPC64 can both have hugepages larger than 2MB, so it's
>reasonable to hope that other implementations of munmap() don't have
>the same gotcha.

We had that, but Heikki ripped it out when merging... I think you're supposed to use /sys to get the available size.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitaly Burovoy 2016-10-12 20:31:45 Re: macaddr 64 bit (EUI-64) datatype support
Previous Message Tom Lane 2016-10-12 20:25:54 munmap() failure due to sloppy handling of hugepage size