measuring the impact of increasing WAL segment size

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: measuring the impact of increasing WAL segment size
Date: 2017-08-15 01:27:00
Message-ID: 2c6ac18b-8094-7a42-114c-ad5e1708cd89@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

A few months ago there was a discussion about increasing the default WAL
segment size [1]. For various reasons we ended up only allowing values
up to 1GB for --with-wal-segsize in configure, one of them being the
absence of sufficient data about how the performance impact.

I've promised to do some benchmarking to provide us some hard data in
that thread, and that's what this post is about. The benchmarks I ended
up doing are not as extensive as I originally proposed, though.

What I have tested:

* 4 different hardware configurations (both spinning rust and flash)
* 2 workloads (tpcb-like and simple-update from pgbench)
* 3 scales (50, 300 and 2000)
* flushing enabled/disabled

I was interested in the impact of the flushing added in PostgreSQL 9.5,
so the flushing refers to *_flush_after GUCs. Enabled means "default"
while "disabled" means everything set to 0.

For each combination of parameters I've done a single 4-hour pgbench
run, which means about 30 days of runtime (22 days is just "clean"
runtime without initialization etc.). This somewhat explains why I
scaled down the range of workloads to test, etc.

I've been collecting various database/system metrics (sar, pg_stat_*,
...), in total it's about 10GB of data compressed, and likely more than
100GB uncompressed. This post only presents some basic summary and
charts, let me know if you're interested in the raw data. Additional
charts are available at [2].

I've mentioned I've done the same tests on 4 different configurations,
so here are some basic details:

1) i5-2500k-ssd-raid

CPU: Intel i5-2500k (4 cores, released 2011)
RAM: 8GB
storage: 6 x Intel S3700 100GB SSD (RAID0, swraid)
kernel: 4.10

2) xeon-e5-2620v4-nvme

CPU: 2x Intel Xeon e5-2620 v4 (8/16 cores, released 2016)
RAM: 32GB
storage: Intel 750 SSD (NVMe, 400GB)
kernel: 4.10

3) xeon-e5-2620v4-sata-raid

CPU: 2x Intel Xeon e5-2620 v4 (8/16 cores, released 2016)
RAM: 32GB
storage: 3 x 7.2k SATA drives
kernel: 4.10

4) xeon-e5450-sas-raid

CPU: 2x Intel Xeon E5450 (4 cores, released 2006)
RAM: 16GB
storage: 6x 10k 146GB SAS (RAID 10, HP P400 with 512MB BBWC)
kernel: 4.10

For all configuration everything (WAL, data) was placed on a single
filesystem. Mostly for simplicity, but also because it shows the "worst
case" impact.

Most of the charts is uninteresting, as there is almost no impact of
either WAL segment size changes or (disabling the) flushing. The changes
in average tps are typically within 1-2%, and if there's a trend it
usually shows tiny improvement for larger WAL segments.

This is true in particular for all tests on configurations with flash
storage, simple-update-50-xeon-e5-2620v4-nvme.eps is a nice example of
such boring chart.

Now, let's look at the interesting charts ...

1) scales 300/2000 on SAS RAID (RAID controller with 512MB write cache)

* simple-update-2000-xeon-e5450-sas-raid.eps
* simple-update-300-xeon-e5450-sas-raid.eps
* tpcb-like-2000-xeon-e5450-sas-raid.eps

I'm not sure what's happening on those two charts, but I suspect it's
mostly a case of overloaded storage, as the machine has only 16GB of
RAM, so scale 2000 does not fit into RAM (and the smaller scales do
behave much more reasonably on this hardware).

2) pretty much everything on the software SATA RAID

* simple-update-2000-xeon-e5-2620v4-sata-raid.eps
* simple-update-300-xeon-e5-2620v4-sata-raid.eps
* simple-update-50-xeon-e5-2620v4-sata-raid.eps
* tpcb-like-2000-xeon-e5-2620v4-sata-raid.eps
* tpcb-like-300-xeon-e5-2620v4-sata-raid.eps
* tpcb-like-50-xeon-e5-2620v4-sata-raid.eps

I don't dare to make absolute judgments based on just one pgbench run,
but the basic trends seem to be fairly clear:

a) The flushing has significant impact on average tps, in some cases
reducing the throughput by 30-40%.

The primary reason for this is of course that the regular flushing
significantly increases the number of fsyncs, which on SATA RAID has
serious impact.

Granted - this chart does not show latency, so it's not a complete
picture. Also, if you care about raw OLTP performance you're probably
already running on flash, where this does not seem to be an issue. It's
also not an issue if you have RAID controller with write cache, which
can absorb those writes. And of course, those machines have reasonable
dirty_background_bytes values (like 64MB or less).

But it's something to be aware of, watch for, and perhaps consider
disabling the flushing (and instead tuning page cache eviction).

b) The "flushing enabled" case seems to be much more sensitive to WAL
segment size increases. It seems the throughput drops a bit (by 10-20%),
for some segment sizes, and then recovers. The behavior seems to be
smooth (not just a sudden drop for one segment size) but the value
varies depending on the scale, test type (tpc-b /simple-update).

There is almost no such impact in the "flushing disabled" cases.

Similarly to the SAS RAID config (with 512MB write cache on the RAID
controller), the largest scale behaves a bit unpredictably. I assume the
reasons are the same - overloaded spinning rust.

regards

[1]
https://www.postgresql.org/message-id/flat/CA%2BTgmoZctR8Sqvgxp2-_fncsgvQSCaYZJ7e%2BoF8XnNLnJwOQ8Q%40mail.gmail.com

[2] https://github.com/tvondra/wal-segment-size-tests

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
simple-update-50-i5-2500k-ssd-raid.eps image/x-eps 24.7 KB
simple-update-50-xeon-e5-2620v4-nvme.eps image/x-eps 24.5 KB
simple-update-50-xeon-e5-2620v4-sata-raid.eps image/x-eps 24.7 KB
simple-update-50-xeon-e5450-sas-raid.eps image/x-eps 24.5 KB
simple-update-300-i5-2500k-ssd-raid.eps image/x-eps 24.5 KB
simple-update-300-xeon-e5-2620v4-nvme.eps image/x-eps 24.5 KB
simple-update-300-xeon-e5-2620v4-sata-raid.eps image/x-eps 24.7 KB
simple-update-300-xeon-e5450-sas-raid.eps image/x-eps 25.4 KB
simple-update-2000-i5-2500k-ssd-raid.eps image/x-eps 24.5 KB
simple-update-2000-xeon-e5-2620v4-nvme.eps image/x-eps 24.3 KB
simple-update-2000-xeon-e5-2620v4-sata-raid.eps image/x-eps 25.2 KB
simple-update-2000-xeon-e5450-sas-raid.eps image/x-eps 24.7 KB
tpcb-like-50-i5-2500k-ssd-raid.eps image/x-eps 25.2 KB
tpcb-like-50-xeon-e5-2620v4-nvme.eps image/x-eps 24.7 KB
tpcb-like-50-xeon-e5-2620v4-sata-raid.eps image/x-eps 25.2 KB
tpcb-like-50-xeon-e5450-sas-raid.eps image/x-eps 25.0 KB
tpcb-like-300-i5-2500k-ssd-raid.eps image/x-eps 25.2 KB
tpcb-like-300-xeon-e5-2620v4-nvme.eps image/x-eps 24.7 KB
tpcb-like-300-xeon-e5-2620v4-sata-raid.eps image/x-eps 24.5 KB
tpcb-like-300-xeon-e5450-sas-raid.eps image/x-eps 24.7 KB
tpcb-like-2000-i5-2500k-ssd-raid.eps image/x-eps 25.4 KB
tpcb-like-2000-xeon-e5-2620v4-nvme.eps image/x-eps 25.2 KB
tpcb-like-2000-xeon-e5-2620v4-sata-raid.eps image/x-eps 25.2 KB
tpcb-like-2000-xeon-e5450-sas-raid.eps image/x-eps 24.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-08-15 01:27:21 Explicit relation name in VACUUM VERBOSE log
Previous Message Marko Tiikkaja 2017-08-15 01:23:23 INSERT .. ON CONFLICT DO SELECT [FOR ..]