Re: patch to allow disable of WAL recycling

From: Jerry Jelinek <jerry(dot)jelinek(at)joyent(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch to allow disable of WAL recycling
Date: 2018-07-12 16:49:22
Message-ID: CACPQ5FrufdyMNB8a7qPxejzm+XqJfDdkUhHCDCu3XaUdKxd9Qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was asked to perform two different tests:
1) A benchmarksql run with WAL recycling on and then off, for comparison
2) A test when the filesystem fills up

For #1, I did two 15 minute benchmarksql runs and here are the results.
wal_recycle=on
--------------
Term-00, Running Average tpmTOTAL: 299.84 Current tpmTOTAL: 29412
Memory U14:49:02,470 [Thread-1] INFO jTPCC : Term-00,
14:49:02,470 [Thread-1] INFO jTPCC : Term-00,
14:49:02,471 [Thread-1] INFO jTPCC : Term-00, Measured tpmC (NewOrders) =
136.49
14:49:02,471 [Thread-1] INFO jTPCC : Term-00, Measured tpmTOTAL = 299.78
14:49:02,471 [Thread-1] INFO jTPCC : Term-00, Session Start =
2018-07-12 14:34:02
14:49:02,471 [Thread-1] INFO jTPCC : Term-00, Session End =
2018-07-12 14:49:02
14:49:02,471 [Thread-1] INFO jTPCC : Term-00, Transaction Count = 4497

wal_recycle=off
---------------
Term-00, Running Average tpmTOTAL: 299.85 Current tpmTOTAL: 29520
Memory U15:10:15,712 [Thread-1] INFO jTPCC : Term-00,
15:10:15,712 [Thread-1] INFO jTPCC : Term-00,
15:10:15,713 [Thread-1] INFO jTPCC : Term-00, Measured tpmC (NewOrders) =
135.89
15:10:15,713 [Thread-1] INFO jTPCC : Term-00, Measured tpmTOTAL = 299.79
15:10:15,713 [Thread-1] INFO jTPCC : Term-00, Session Start =
2018-07-12 14:55:15
15:10:15,713 [Thread-1] INFO jTPCC : Term-00, Session End =
2018-07-12 15:10:15
15:10:15,713 [Thread-1] INFO jTPCC : Term-00, Transaction Count = 4497

As can be seen, disabling WAL recycling does not cause any performance
regression.

For #2, I ran the test with WAL recycling on (the current behavior as well
as the default with this patch) since the behavior of postgres is
orthogonal to WAL recycling when the filesystem fills up.

I capped the filesystem with 32MB of free space. I setup a configuration
with wal_keep_segments=50 and started a long benchmarksql run. I had 4 WAL
files already in existence when the run started.

As the filesystem fills up, the performance of postgres gets slower and
slower, as would be expected. This is due to the COW nature of the
filesystem and the fact that all writes need to find space.

When a new WAL file is created, this essentially consumes no space since it
is a zero-filled file, so no filesystem space is consumed, except for a
little metadata for the file. However, as writes occur to the WAL
file, space is being consumed. Eventually all space in the filesystem is
consumed. I could not tell if this occurred during a write to an existing
WAL file or a write to the database itself. As other people have observed,
WAL file creation in a COW filesystem is not the problematic operation when
the filesystem fills up. It is the writes to existing files that will fail.
When postgres core dumped there were 6 WAL files in the pg_wal directory
(well short of the 50 configured).

When the filesystem filled up, postgres core dumped and benchmarksql
emitted a bunch of java debug information which I could provide if anyone
is interested.

Here is some information for the postgres core dump. It looks like postgres
aborted itself, but since the filesystem is full, there is nothing in the
log file.
> ::status
debugging core file of postgres (64-bit) from
f6c22f98-38aa-eb51-80d2-811ed25bed6b
file: /zones/f6c22f98-38aa-eb51-80d2-811ed25bed6b/local/pgsql/bin/postgres
initial argv: /usr/local/pgsql/bin/postgres -D /home/postgres/data
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=76019 uid=1001 code=-1
> $C
fffff9ffffdfa4b0 libc.so.1`_lwp_kill+0xa()
fffff9ffffdfa4e0 libc.so.1`raise+0x20(6)
fffff9ffffdfa530 libc.so.1`abort+0x98()
fffff9ffffdfa560 errfinish+0x230()
fffff9ffffdfa5e0 XLogWrite+0x294()
fffff9ffffdfa610 XLogBackgroundFlush+0x18d()
fffff9ffffdfaa50 WalWriterMain+0x1a8()
fffff9ffffdfaab0 AuxiliaryProcessMain+0x3ff()
fffff9ffffdfab40 0x7b5566()
fffff9ffffdfab90 reaper+0x60a()
fffff9ffffdfaba0 libc.so.1`__sighndlr+6()
fffff9ffffdfac30 libc.so.1`call_user_handler+0x1db(12, 0, fffff9ffffdfaca0)
fffff9ffffdfac80 libc.so.1`sigacthandler+0x116(12, 0, fffff9ffffdfaca0)
fffff9ffffdfb0f0 libc.so.1`__pollsys+0xa()
fffff9ffffdfb220 libc.so.1`pselect+0x26b(7, fffff9ffffdfdad0, 0, 0,
fffff9ffffdfb230, 0)
fffff9ffffdfb270 libc.so.1`select+0x5a(7, fffff9ffffdfdad0, 0, 0,
fffff9ffffdfb6c0)
fffff9ffffdffb00 ServerLoop+0x289()
fffff9ffffdffb70 PostmasterMain+0xcfa()
fffff9ffffdffba0 main+0x3cd()
fffff9ffffdffbd0 _start_crt+0x83()
fffff9ffffdffbe0 _start+0x18()

Let me know if there is any other information I could provide.

Thanks,
Jerry

On Tue, Jun 26, 2018 at 7:35 AM, Jerry Jelinek <jerry(dot)jelinek(at)joyent(dot)com>
wrote:

> Hello All,
>
> Attached is a patch to provide an option to disable WAL recycling. We have
> found that this can help performance by eliminating read-modify-write
> behavior on old WAL files that are no longer resident in the filesystem
> cache. The is a lot more detail on the background of the motivation for
> this in the following thread.
>
> https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8e2AijOayj8BfKK3
> XXBTwu3KKARiTr67M3E3w%40mail.gmail.com#CACukRjO7DJvub8e2AijOayj8BfKK3
> XXBTwu3KKARiTr67M3E3w(at)mail(dot)gmail(dot)com
>
> A similar change has been tested against our 9.6 branch that we're
> currently running, but the attached patch is against master.
>
> Thanks,
> Jerry
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-07-12 16:53:58 Re: Fix error message when trying to alter statistics on included column
Previous Message Andres Freund 2018-07-12 16:44:44 Re: Concurrency bug in UPDATE of partition-key