Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Use MAP_HUGETLB where supported (v3)
Date: 2013-11-15 13:17:32
Message-ID: 52861EEC.2090702@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30.10.2013 19:11, Andres Freund wrote:
> On 2013-10-30 22:39:20 +0530, Abhijit Menon-Sen wrote:
>> At 2013-10-30 11:04:36 -0400, tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
>>>
>>>> As a compromise, perhaps we can unconditionally round the size up to be
>>>> a multiple of 2MB? […]
>>>
>>> That sounds reasonably painless to me.
>>
>> Here's a patch that does that and adds a DEBUG1 log message when we try
>> with MAP_HUGETLB and fail and fallback to ordinary mmap.
>
> But it's in no way guaranteed that the smallest hugepage size is
> 2MB. It'll be on current x86 hardware, but not on any other platform...

Sure, but there's no big harm done. We're just trying to avoid hitting a
kernel bug, and as a bonus, we avoid wasting some memory that would
otherwise be lost due to the kernel rounding the allocation. If the
smallest hugepage size is smaller than 2MB, we round up the allocation
unnecessarily, but that doesn't seem serious.

I spent some time whacking this around, new patch version attached. I
moved the mmap() code into a new function, that leaves the
PGSharedMemoryCreate more readable.

I modified the patch so that it throws an error if you set
huge_tlb_pages=on, and the platform doesn't support MAP_HUGETLB (ie.
non-Linux, or EXEC_BACKEND). 'try' is the default, so this only affects
you if you explicitly set it to 'on'. I think that's the right behavior;
if you explicitly ask for it, and you don't get it, that should be an
error. But I'm not wedded to the idea if someone objects; a log message
might also be reasonable: "LOG: huge TLB pages are not supported on this
platform, but huge_tlb_pages was 'on'"

The error message on failed allocation, if huge_tlb_pages=on, needs
updating:

$ bin/postmaster -D data
FATAL: could not map anonymous shared memory: Cannot allocate memory
HINT: This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory or swap space. To reduce the
request size (currently 189390848 bytes), reduce PostgreSQL's shared
memory usage, perhaps by reducing shared_buffers or max_connections.

The reason the allocation failed in this case was that I used
huge_tlb_pages=on, but had not configured the kernel for huge pages. The
hint is quite misleading in that case, it should advise to configure the
kernel, or turn off huge_tlb_pages.

The documentation needs some work. I think it's pretty user-unfriendly
to link to https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.
It gives a lot of details, and although it explains stuff that is
relevant, like setting the nr_hugepages sysctl, it also contains a lot
of stuff that is not relevant to us, like how to mount hugetlbfs. Can we
do better than that? Is there a better guide somewhere on how to set the
kernel settings. If not, we should include step-by-step instructions in
our manual.

The "Managing Kernel Resources" section in the user manual should also
be updated to mention how to enable huge pages.

Also, now that I changed huge_tlb_pages='on' to fail on platforms where
it's not supported at all, the docs need to be updated to reflect it.

- Heikki

Attachment Content-Type Size
hugepages-v5.patch text/x-diff 11.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Golub 2013-11-15 13:19:43 Re: LISTEN / NOTIFY enhancement request for Postgresql
Previous Message Simon Riggs 2013-11-15 12:25:26 Re: Proof of concept: standalone backend with full FE/BE protocol