Re: Report: Linux huge pages with Postgres

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Report: Linux huge pages with Postgres
Date: 2010-11-28 22:30:38
Message-ID: 20101128223038.GA13313@aart.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 27, 2010 at 02:27:12PM -0500, Tom Lane wrote:
> We've gotten a few inquiries about whether Postgres can use "huge pages"
> under Linux. In principle that should be more efficient for large shmem
> regions, since fewer TLB entries are needed to support the address
> space. I spent a bit of time today looking into what that would take.
> My testing was done with current Fedora 13, kernel version
> 2.6.34.7-61.fc13.x86_64 --- it's possible some of these details vary
> across other kernel versions.
>
> You can test this with fairly minimal code changes, as illustrated in
> the attached not-production-grade patch. To select huge pages we have
> to include SHM_HUGETLB in the flags for shmget(), and we have to be
> prepared for failure (due to permissions or lack of allocated
> hugepages). I made the code just fall back to a normal shmget on
> failure. A bigger problem is that the shmem request size must be a
> multiple of the system's hugepage size, which is *not* a constant
> even though the test patch just uses 2MB as the assumed value. For a
> production-grade patch we'd have to scrounge the active value out of
> someplace in the /proc filesystem (ick).
>

I would expect that you can just iterate through the size possibilities
pretty quickly and just use the first one that works -- no /proc
groveling.

> In addition to the code changes there are a couple of sysadmin
> requirements to make huge pages available to Postgres:
>
> 1. You have to configure the Postgres user as a member of the group
> that's permitted to allocate hugepage shared memory. I did this:
> sudo sh -c "id -g postgres >/proc/sys/vm/hugetlb_shm_group"
> For production use you'd need to put this in the PG initscript,
> probably, to ensure it gets re-set after every reboot and before PG
> is started.
>
Since it would take advantage of them automatically, this would be
just a normal DBA/admin task.

> 2. You have to manually allocate some huge pages --- there doesn't
> seem to be any setting that says "just give them out on demand".
> I did this:
> sudo sh -c "echo 600 >/proc/sys/vm/nr_hugepages"
> which gave me a bit over 1GB of space reserved as huge pages.
> Again, this'd have to be done over again at each system boot.
>
Same.

> For testing purposes, I figured that what I wanted to stress was
> postgres process swapping and shmem access. I built current git HEAD
> with --enable-debug and no other options, and tested with these
> non-default settings:
> shared_buffers 1GB
> checkpoint_segments 50
> fsync off
> (fsync intentionally off since I'm not trying to measure disk speed).
> The test machine has two dual-core Nehalem CPUs. Test case is pgbench
> at -s 25; I ran several iterations of "pgbench -c 10 -T 60 bench"
> in each configuration.
>
> And the bottom line is: if there's any performance benefit at all,
> it's on the order of 1%. The best result I got was about 3200 TPS
> with hugepages, and about 3160 without. The noise in these numbers
> is more than 1% though.
>
> This is discouraging; it certainly doesn't make me want to expend the
> effort to develop a production patch. However, perhaps someone else
> can try to show a greater benefit under some other test conditions.
>
> regards, tom lane
>
I would not really expect to see much benefit in the region that the
normal TLB page size would cover with the typical number of TLB entries.
1GB of shared buffers would not be enough to cause TLB thrashing with
most processors. Bump it to 8-32GB or more and if the queries use up
TLB entries with local work_mem you should see some more value in the
patch.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2010-11-28 22:41:25 Re: contrib: auth_delay module
Previous Message Tom Lane 2010-11-28 22:03:20 Re: Rethinking representation of sort/hash semantics in queries and plans