Re: huge tlb support

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: huge tlb support
Date: 2012-07-03 11:30:35
Message-ID: 201207031330.36372.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, July 03, 2012 05:18:04 AM Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> >> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the
> >> mmap'ed memory.
> >
> > So, considering that there is required setup, it seems that the
> > obvious thing to do here is add a GUC: huge_tlb_pages (boolean).
We also need some logic to figure out how big the huge tlb size is...
/sys/kernel/mm/hugepages/* contains a directory for each possible size. A bit
unfortunately named though "hugepages-2048kB". We need to parse that.

> > The other alternative is to try with MAP_HUGETLB and, if it fails, try
> > again without MAP_HUGETLB.
> +1 for not making people configure this manually.
I don't think thats going to fly that well. You need to specifically allocate
hugepages at boot or shortly thereafter. If postgres just grabs some of the
available space without asking it very well might cause other applications not
to be able to start. Were not allocating half of the system memory without
asking either...

> Also, I was under the impression that recent Linux kernels use hugepages
> automatically if they can, so I wonder exactly what Andres was testing
> on ...
At the time I was running the test I was running a moderately new kernel:

andres(at)awork2:~$ uname -a
Linux awork2 3.4.3-andres #138 SMP Mon Jun 19 12:46:32 CEST 2012 x86_64
GNU/Linux
andres(at)awork2:~$ zcat /proc/config.gz |grep HUGE
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

So, transparent hugepages are enabled by default.

The problem is that the kernel needs 2MB of adjacent physical memory mapping
to 2MB of adjacent virtual memory. In on-demand, cow virtual memory systems
that just doesn't happen all the time if youre not doing file mmap while
triggering massive readaheads. Especially if the system has been running for
some time because the memory just gets too fragmented to have lots of adjacent
physical memory around.
There was/is talk about moving physical memory around to make room for more
huge pages but thats not there yet and the patches I have seen incurred quite
some overhead.
Btw, the introduction of transparent hugepages advocated that there are still
benefits in manual hugepage setups.

Btw, should anybody want to test this:
After boot you can allocate huge pages with:
during runtime:
echo 3000 > /proc/sys/vm/nr_hugepages
or at boot you can add a parameter:
hugepages=3000
(allocates 6GB of huge pages on x86-64)

The runtime one might take quite a time till it has found enough pages or even
fall short.

You can see the huge page status with:
andres(at)awork2:~$ cat /proc/meminfo |grep Huge
AnonHugePages: 591872 kB
HugePages_Total: 3000
HugePages_Free: 3000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-07-03 12:13:05 Re: xlog filename formatting functions in recovery
Previous Message Amit Kapila 2012-07-03 10:43:44 Re: xlog filename formatting functions in recovery