Re: patch to allow disable of WAL recycling

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jerry Jelinek <jerry(dot)jelinek(at)joyent(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch to allow disable of WAL recycling
Date: 2018-08-16 21:43:25
Message-ID: e9b34a5a-253d-ffea-679c-0edd963a7586@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/22/2018 10:50 PM, Tomas Vondra wrote:
> On 07/21/2018 12:04 AM, Jerry Jelinek wrote:
>> Thomas,
>>
>> Thanks for your offer to run some tests on different OSes and
>> filesystems that you have. Anything you can provide here would be much
>> appreciated. I don't have anything other than our native SmartOS/ZFS
>> based configurations, but I might be able to setup some VMs and get
>> results that way. I should be able to setup a VM running FreeBSD. If you
>> have a chance to collect some data, just let me know the exact
>> benchmarks you ran and I'll run the same things on the FreeBSD VM.
>> Obviously you're under no obligation to do any of this, so if you don't
>> have time, just let me know and I'll see what I can do on my own.
>>
>
> Sounds good. I plan to start with the testing in a couple of days - the
> boxes are currently running some other tests at the moment. Once I have
> some numbers I'll share them here, along with the test scripts etc.
>

I do have initial results from one of the boxes. It's not complete, and
further tests are still running, but I suppose it's worth sharing what I
have at this point.

As usual, the full data and ugly scripts are available in a git repo:

https://bitbucket.org/tvondra/wal-recycle-test-xeon/src/master/

Considering the WAL recycling only kicks in after a while, I've decided
to do a single long (6-hour) pgbench run for each scale, instead of the
usual "multiple short runs" approach.

So far I've tried on these filesystems:

* btrfs
* ext4 / delayed allocation enabled (default)
* ext4 / delayed allocation disabled
* xfs

The machine has 64GB of RAM, so I've chosen scales 200 (fits into
shared_buffers), 2000 (in RAM) and 8000 (exceeds RAM), to trigger
different I/O patterns. I've used the per-second aggregated logging,
with the raw data available in the git repo. The charts attached to this
message are per-minute tps averages, to demonstrate the overall impact
on throughtput which would otherwise be hidden in jitter.

All these tests are done on Optane 900P 280GB SSD, which is pretty nice
storage but the limited size is somewhat tight for the scale 8000 test.

For the traditional filesystems (ext4, xfs) the WAL recycling seems to
be clearly beneficial - for the in-memory datasets the difference seems
to be negligible, but for the largest scale it gives maybe +20% benefit.
The delalloc/nodellalloc on ext4 makes pretty much no difference here,
and both xfs and ext4 peform almost exactly the same here - the main
difference seems to be that on ext4 the largest scale ran out of disk
space while xfs managed to keep running. Clearly there's a difference in
free space management, but that's unrelated to this patch.

On BTRFS, the results on the two smaller scales show about the same
behavior (minimal difference between WAL recycling and not recycling),
except that the throughput is perhaps 25-50% of ext4/xfs. Fair enough, a
different type of filesystem, and LVM snapshots would likely have the
same impact. But no clear win with recycling disabled. On the largest
scale, the device ran out of space after 10-20 minutes, which makes it
impossible to make any reasonable conclusions :-(

I plan to do some more tests with zfsonlinux, and LVM with snapshots. I
wonder if those will show some benefit of disabling the WAL recycling.
And then, if time permits, I'll redo some of those tests with a small
SATA-based RAID array (aka spinning rust). Mostly out of curiosity.

FWIW I've planned to do these tests on another machine, but I've ran
into some strange data corruption issues on it, and I've spent quite a
bit of time investigating that and trying to reproduce it, which delayed
these tests a bit. And of course, once I added elog(PANIC) to the right
place it stopped happening :-/

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
btrfs.pdf application/pdf 33.0 KB
ext4-delalloc.pdf application/pdf 35.3 KB
ext4-nodelalloc.pdf application/pdf 35.4 KB
xfs.pdf application/pdf 35.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-08-16 21:44:19 Re: Index Skip Scan
Previous Message David G. Johnston 2018-08-16 20:44:31 Re: [HACKERS] Bug in to_timestamp().