Re: Initial 9.2 pgbench write results

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial 9.2 pgbench write results
Date: 2012-02-28 16:36:41
Message-ID: CAMkU=1zTJP7Uo8YjacmC0e=2+_zsam7mPSCmUMgB=2Vx42Pskg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 23, 2012 at 3:17 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> I think an even bigger factor now is that the BGW writes can disturb write
> ordering/combining done at the kernel and storage levels.  It's painfully
> obvious now how much PostgreSQL relies on that to get good performance.  All
> sorts of things break badly if we aren't getting random writes scheduled to
> optimize seek times, in as many contexts as possible.  It doesn't seem
> unreasonable that background writer writes can introduce some delay into the
> checkpoint writes, just by adding more random components to what is already
> a difficult to handle write/sync series.  That's what I think what these
> results are showing is that background writer writes can deoptimize other
> forms of write.

How hard would it be to dummy up a bgwriter which, every time it wakes
up, it forks off a child process to actually do the write, and then
the real one just waits for the child to exit? If it didn't have to
correctly handle signals, SINVAL, and such, it should be just a few
lines of code, but I don't know how much we can ignore signals and
such even just for testing purposes. My thought here is that the
kernel is getting in a snit over one process doing all the writing on
the system, and is punishing that process in a way that ruins things
for everyone.

>
> A second fact that's visible from the TPS graphs over the test run, and
> obvious if you think about it, is that BGW writes force data to physical
> disk earlier than they otherwise might go there.

On a busy system like you are testing, the BGW should only be writing
out data a fraction of a second before the backends would otherwise be
doing it, unless the "2 minutes to circle the buffer pool" logic is in
control rather than the bgwriter_lru_multiplier and
bgwriter_lru_maxpages logic. From the data reported, we can see how
many buffer-allocations there are but not how many circles of the pool
it took to find them)

It doesn't seem likely that small shifts in timing are having that
effect, compared to the possible effect of who is doing the writing.
If the timing is truly the issue, lowering bgwriter_delay might smooth
the timing out and bring closer to what the backends would do for
themselves.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2012-02-28 16:40:29 Re: Hot Standby Failover Scenario
Previous Message Kohei KaiGai 2012-02-28 16:33:38 Re: [v9.2] Add GUC sepgsql.client_label