Re: Report: Linux huge pages with Postgres

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Report: Linux huge pages with Postgres
Date: 2010-11-29 00:12:51
Message-ID: 27373.1290989571@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> On Sat, Nov 27, 2010 at 02:27:12PM -0500, Tom Lane wrote:
>> ... A bigger problem is that the shmem request size must be a
>> multiple of the system's hugepage size, which is *not* a constant
>> even though the test patch just uses 2MB as the assumed value. For a
>> production-grade patch we'd have to scrounge the active value out of
>> someplace in the /proc filesystem (ick).

> I would expect that you can just iterate through the size possibilities
> pretty quickly and just use the first one that works -- no /proc
> groveling.

It's not really that easy, because (at least on the kernel version I
tested) it's not the shmget that fails, it's the later shmat. Releasing
and reacquiring the shm segment would require significant code
restructuring, and at least on some platforms could produce weird
failure cases --- I seem to recall having heard of kernels where the
release isn't instantaneous, so that you could run up against SHMMAX
for no apparent reason. Really you do want to scrape the value.

>> 2. You have to manually allocate some huge pages --- there doesn't
>> seem to be any setting that says "just give them out on demand".
>> I did this:
>> sudo sh -c "echo 600 >/proc/sys/vm/nr_hugepages"
>> which gave me a bit over 1GB of space reserved as huge pages.
>> Again, this'd have to be done over again at each system boot.

> Same.

The fact that hugepages have to be manually managed, and that any
unaccounted-for represent completely wasted RAM, seems like a pretty
large PITA to me. I don't see anybody buying into that for gains
measured in single-digit percentages.

> 1GB of shared buffers would not be enough to cause TLB thrashing with
> most processors.

Well, bigger cases would be useful to try, although Simon was claiming
that the TLB starts to fall over at 4MB of working set. I don't have a
large enough machine to try the sort of test you're suggesting, so if
anyone thinks this is worth pursuing, there's the patch ... go test it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-11-29 00:15:43 Re: profiling connection overhead
Previous Message Robert Haas 2010-11-29 00:12:39 Re: contrib: auth_delay module