Re: Linux kernel impact on PostgreSQL performance

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Mel Gorman <mgorman(at)suse(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>
Subject: Re: Linux kernel impact on PostgreSQL performance
Date: 2014-01-13 23:24:38
Message-ID: 52D475B6.3020305@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/13/2014 02:26 PM, Mel Gorman wrote:
> Really?
>
> zone_reclaim_mode is often a complete disaster unless the workload is
> partitioned to fit within NUMA nodes. On older kernels enabling it would
> sometimes cause massive stalls. I'm actually very surprised to hear it
> fixes anything and would be interested in hearing more about what sort
> of circumstnaces would convince you to enable that thing.

So the problem with the default setting is that it pretty much isolates
all FS cache for PostgreSQL to whichever socket the postmaster is
running on, and makes the other FS cache unavailable. This means that,
for example, if you have two memory banks, then only one of them is
available for PostgreSQL filesystem caching ... essentially cutting your
available cache in half.

And however slow moving cached pages between memory banks is, it's an
order of magnitude faster than moving them from disk. But this isn't
how the NUMA stuff is configured; it seems to assume that it's less
expensive to get pages from disk than to move them between banks, so
whatever you've got cached on the other bank, it flushes it to disk as
fast as possible. I understand the goal was to make memory usage local
to the processors stuff was running on, but that includes an implicit
assumption that no individual process will ever want more than one
memory bank worth of cache.

So disabling all of the NUMA optimizations is the way to go for any
workload I personally deal with.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2014-01-13 23:33:52 Re: Disallow arrays with non-standard lower bounds
Previous Message Florian Pflug 2014-01-13 23:19:41 Re: Standalone synchronous master