Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Dave Chinner <david(at)fromorbit(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Kara <jack(at)suse(dot)cz>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Trond Myklebust <trondmy(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, James Bottomley <James(dot)Bottomley(at)hansenpartnership(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-14 22:23:52
Message-ID: 20140114222352.GF3431@dastard
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Jan Kara <jack(at)suse(dot)cz> wrote:
> >
> >> Just to get some idea about the sizes - how large are the
> >> checkpoints we are talking about that cause IO stalls?
> >
> > Big.
>
> To quantify that, in a production setting we were seeing pauses of
> up to two minutes with shared_buffers set to 8GB and default dirty
^^^^^^^^^^^^^
> page settings for Linux, on a machine with 256GB RAM and 512MB
^^^^^^^^^^^^^
There's your problem.

By default, background writeback doesn't start until 10% of memory
is dirtied, and on your machine that's 25GB of RAM. That's way to
high for your workload.

It appears to me that we are seeing large memory machines much more
commonly in data centers - a couple of years ago 256GB RAM was only
seen in supercomputers. Hence machines of this size are moving from
"tweaking settings for supercomputers is OK" class to "tweaking
settings for enterprise servers is not OK"....

Perhaps what we need to do is deprecate dirty_ratio and
dirty_background_ratio as the default values as move to the byte
based values as the defaults and cap them appropriately. e.g.
10/20% of RAM for small machines down to a couple of GB for large
machines....

> non-volatile cache on the RAID controller.  To eliminate stalls we
> had to drop shared_buffers to 2GB (to limit how many dirty pages
> could build up out-of-sight from the OS), spread checkpoints to 90%
> of allowed time (almost no gap between finishing one checkpoint and
> starting the next) and crank up the background writer so that no
> dirty page sat unwritten in PostgreSQL shared_buffers for more than
> 4 seconds. Less aggressive pushing to the OS resulted in the
> avalanche of writes I previously described, with the corresponding
> I/O stalls.  We approached that incrementally, and that's the point
> where stalls stopped occurring.  We did not adjust the OS
> thresholds for writing dirty pages, although I know of others who
> have had to do so.

Essentially, changing dirty_background_bytes, dirty_bytes and
dirty_expire_centiseconds to be much smaller should make the kernel
start writeback much sooner and so you shouldn't have to limit the
amount of buffers the application has to prevent major fsync
triggered stalls...

Cheers,

Dave.
--
Dave Chinner
david(at)fromorbit(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-01-14 22:27:57 Re: Add CREATE support to event triggers
Previous Message Heikki Linnakangas 2014-01-14 22:16:04 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE