Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Dave Chinner <david(at)fromorbit(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Mel Gorman <mgorman(at)suse(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-14 01:30:56
Message-ID: 20140114013056.GB3431@dastard
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote:
> On 01/13/2014 02:26 PM, Mel Gorman wrote:
> > Really?
> >
> > zone_reclaim_mode is often a complete disaster unless the workload is
> > partitioned to fit within NUMA nodes. On older kernels enabling it would
> > sometimes cause massive stalls. I'm actually very surprised to hear it
> > fixes anything and would be interested in hearing more about what sort
> > of circumstnaces would convince you to enable that thing.
>
> So the problem with the default setting is that it pretty much isolates
> all FS cache for PostgreSQL to whichever socket the postmaster is
> running on, and makes the other FS cache unavailable. This means that,
> for example, if you have two memory banks, then only one of them is
> available for PostgreSQL filesystem caching ... essentially cutting your
> available cache in half.

No matter what default NUMA allocation policy we set, there will be
an application for which that behaviour is wrong. As such, we've had
tools for setting application specific NUMA policies for quite a few
years now. e.g:

$ man 8 numactl
....
--interleave=nodes, -i nodes
Set a memory interleave policy. Memory will be
allocated using round robin on nodes. When memory
cannot be allocated on the current interleave target
fall back to other nodes. Multiple nodes may be
specified on --interleave, --membind and
--cpunodebind.

Cheers,

Dave.
--
Dave Chinner
david(at)fromorbit(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-01-14 01:37:14 Re: Where do we stand on 9.3 bugs?
Previous Message Josh Berkus 2014-01-14 01:30:55 Re: Where do we stand on 9.3 bugs?