Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: Mel Gorman <mgorman(at)suse(dot)de>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-23 21:11:20
Message-ID: 52E18578.9000700@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/20/14 9:46 AM, Mel Gorman wrote:
> They could potentially be used to evalate any IO scheduler changes.
> For example -- deadline scheduler with these parameters has X
> transactions/sec throughput with average latency of Y millieseconds
> and a maximum fsync latency of Z seconds. Evaluate how well the
> out-of-box behaviour compares against it with and without some set of
> patches. At the very least it would be useful for tracking historical
> kernel performance over time and bisecting any regressions that got
> introduced. Once we have a test I think many kernel developers (me at
> least) can run automated bisections once a test case exists.

That's the long term goal. What we used to get out of pgbench were
things like >60 second latencies when a checkpoint hit with GBs of dirty
memory. That does happen in the real world, but that's not a realistic
case you can tune for very well. In fact, tuning for it can easily
degrade performance on more realistic workloads.

The main complexity I don't have a clear view of yet is how much
unavoidable storage level latency there is in all of the common
deployment types. For example, I can take a server with a 256MB
battery-backed write cache and set dirty_background_bytes to be smaller
than that. So checkpoint spikes go away, right? No. Eventually you
will see dirty_background_bytes of data going into an already full 256MB
cache. And when that happens, the latency will be based on how long it
takes to write the cached 256MB out to the disks. If you have a single
disk or RAID-1 pair, that random I/O could easily happen at 5MB/s or
less, and that makes for a 51 second cache clearing time. This is a lot
better now than it used to be because fsync hasn't flushed the whole
cache in many years now. (Only RHEL5 systems still in the field suffer
much from that era of code) But you do need to look at the distribution
of latency a bit because of how the cache impact things, you can't just
consider min/max values.

Take the BBWC out of the equation, and you'll see latency proportional
to how long it takes to clear the disk's cache out. It's fun "upgrading"
from a disk with 32MB of cache to 64MB only to watch worst case latency
double. At least the kernel does the right thing now, using that cache
when it can while forcing data out when fsync calls arrive. (That's
another important kernel optimization we'll never be able to teach the
database)

--
Greg Smith greg(dot)smith(at)crunchydatasolutions(dot)com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2014-01-23 21:14:18 Re: Why do we let autovacuum give up?
Previous Message Robert Haas 2014-01-23 21:09:41 Re: Why do we let autovacuum give up?