postgresql latency & bgwriter not doing its job

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: postgresql latency & bgwriter not doing its job
Date: 2014-08-25 20:23:40
Message-ID: alpine.DEB.2.10.1408251900211.11151@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello pgdevs,

I've been playing with pg for some time now to try to reduce the maximum
latency of simple requests, to have a responsive server under small to
medium load.

On an old computer with a software RAID5 HDD attached, pgbench
simple update script run for a some time (scale 100, fillfactor 95)

pgbench -M prepared -N -c 2 -T 500 -P 1 ...

gives 300 tps. However this performance is really +1000 tps for a few
seconds followed by 16 seconds at about 0 tps for the checkpoint induced
IO storm. The server is totally unresponsive 75% of the time. That's
bandwidth optimization for you. Hmmm... why not.

Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above
max):

pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ...

The same thing more or less happens in a delayed fashion... You get 50 tps
for some time, followed by sections of 15 seconds at 0 tps for the
checkpoint when the segments are full... the server is unresponsive about
10% of the time (one in ten transaction is late by more than 200 ms).

It is not satisfying, pg should be able to handle that load easily.

The culprit I found is "bgwriter", which is basically doing nothing to
prevent the coming checkpoint IO storm, even though there would be ample
time to write the accumulating dirty pages so that checkpoint would find a
clean field and pass in a blink. Indeed, at the end of the 500 seconds
throttled test, "pg_stat_bgwriter" says:

buffers_checkpoint = 19046
buffers_clean = 2995

Which suggest that bgwriter took on him to write only 6 pages per second,
where at least 50 would have been needed for the load, and could have been
handled by the harware without any problem.

I have not found any mean to force bgwriter to send writes when it can.
(Well, I have: create a process which sends "CHECKPOINT" every 0.2
seconds... it works more or less, but this is not my point:-)

Bgwriter control parameters allow to control the maximum number of pages
(bgwriter_lru_maxpages) written per round (bgwriter_delay), and a
multiplier (bgwriter_lru_multiplier) which controls some heuristics to
estimate how many pages should be needed so as to make them available.
This may be nice in some settings, but is not adapted to the write
oriented OTPL load tested with pgbench.

The problem is that with the update load on a fitting in memory database
there is not that much need of "new" pages, even if pages are being
dirtied (about 50 per seconds), so it seems that the heuristics decides
not to write much. The net result of all this cleverness is that when the
checkpoint arrives, several thousand pages have to be written and the
server is offline for some time.

ISTM that bgwriter lacks at least some "min page" setting it could be
induced to free this many pages if it can. That would be a start.

A better feature would be that it adapts itself to take advantage of the
available IOPS, depending on the load induce by other tasks (vacuum,
queries...), in a preventive manner, so as to avoid delaying what can be
done right now under a small load, and thus avoid later IO storms. This
would suggest that some setting would provide the expected IOPS capability
of the underlying hardware, as some already suggest the expected available
memory.

Note that this preventive approach could also improve the "bandwith"
measure: currently when pgbench is running at maximum speed before the
checkpoint storm, nothing is written to disk but WAL, although it could
probably also write some dirty pages. When the checkpoints arrives, less
pages would need to be written, so the storm would be shorter.

Any thoughts on this latency issue? Am I missing something?

--
Fabien.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-25 20:30:41 Re: pg_upgrade: allow multiple -o/-O options
Previous Message Michael Paquier 2014-08-25 20:15:38 Re: Add CREATE support to event triggers