Re: postgresql latency & bgwriter not doing its job

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgresql latency & bgwriter not doing its job
Date: 2014-08-27 10:28:43
Message-ID: 20140827102843.GE21544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-08-27 11:19:22 +0200, Andres Freund wrote:
> On 2014-08-27 11:14:46 +0200, Andres Freund wrote:
> > On 2014-08-27 11:05:52 +0200, Fabien COELHO wrote:
> > > I can test a couple of patches. I already did one on someone advice (make
> > > bgwriter round all stuff in 1s instead of 120s, without positive effect.
> >
> > I've quickly cobbled together the attached patch (which at least doesn't
> > seem to crash & burn). It tries to trigger pages being flushed out
> > during the paced phase of checkpoints instead of the fsync phase. The
> > sync_on_checkpoint_flush can be used to enable/disable that behaviour.
> >
> > I'd be interested to hear whether that improves your latency numbers. I
> > unfortunately don't have more time to spend on this right now :(.
>
> And actually attached. Note that it's linux only...

I got curious and ran a quick test:

config:
log_checkpoints=on
checkpoint_timeout=1min
checkpoint_completion_target=0.95
checkpoint_segments=100
synchronous_commit=on
fsync=on
huge_pages=on
max_connections=200
shared_buffers=6GB
wal_level=hot_standby

off:

$ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 150
query mode: prepared
number of clients: 16
number of threads: 16
duration: 120 s
number of transactions actually processed: 20189
latency average: 23.136 ms
latency stddev: 59.044 ms
rate limit schedule lag: avg 4.599 (max 199.975) ms
number of skipped transactions: 1345 (6.246 %)
tps = 167.664290 (including connections establishing)
tps = 167.675679 (excluding connections establishing)

LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 12754 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=56.928 s, sync=3.639 s, total=60.749 s; sync files=20, longest=2.741 s, average=0.181 s
LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 12269 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 6 recycled; write=20.701 s, sync=8.568 s, total=29.444 s; sync files=10, longest=3.568 s, average=0.856 s

on:

$ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 150
query mode: prepared
number of clients: 16
number of threads: 16
duration: 120 s
number of transactions actually processed: 21327
latency average: 20.735 ms
latency stddev: 14.643 ms
rate limit schedule lag: avg 4.965 (max 185.003) ms
number of skipped transactions: 1 (0.005 %)
tps = 177.214391 (including connections establishing)
tps = 177.225476 (excluding connections establishing)

LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 12217 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=57.022 s, sync=0.203 s, total=57.377 s; sync files=19, longest=0.033 s, average=0.010 s
LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 13185 buffers (1.7%); 0 transaction log file(s) added, 0 removed, 6 recycled; write=56.628 s, sync=0.019 s, total=56.803 s; sync files=11, longest=0.017 s, average=0.001 s

That machine is far from idle right now, so the noise is pretty
high. But rather nice initial results.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2014-08-27 10:40:39 Re: re-reading SSL certificates during server reload
Previous Message Heikki Linnakangas 2014-08-27 09:57:36 Re: pgbench throttling latency limit