Re: wal_buffers, redux

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: wal_buffers, redux
Date: 2012-03-12 14:16:23
Message-ID: CA+TgmoaXvKogH-hTQtNCRsyrC_7zJStesXDzhxZOtjJn4+UjAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 11, 2012 at 11:51 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I've finally been able to run some more tests of the effect of
>> adjusting wal_buffers to values higher than 16MB.  I ran the test on
>> the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
>> configuration settings:
>>
>> shared_buffers = 8GB
>> maintenance_work_mem = 1GB
>> synchronous_commit = off
>> checkpoint_segments = 300
>> checkpoint_timeout = 15min
>> checkpoint_completion_target = 0.9
>> wal_writer_delay = 20ms
>>
>> I ran three 30-minute tests at scale factor 300 with wal_buffers set
>> at various values from 16MB up to 160MB, in multiples of 16MB, using
>> pgbench with 32 clients and 32 threads in each case.  The short
>> version is that 32MB seems to be significantly better than 16MB, by
>> about 1000 tps, and after that it gets murky; full results are below.
>
> Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
> when it's set to -1. Thanks to your result, we should increase the max to
> 32MB?

I think that might be a good idea, although I'm not entirely convinced
that we understand why increasing wal_buffers is helping as much as it
is. I stuck an elog() into AdvanceXLInsertBuffer() to complain in the
case that we were writing buffers while holding the insert lock.
Then, I reran 30-minute tests 32 clients, one with wal_buffers=16MB
and the other wal_buffers=32MB. On the 16MB test, the elog() fired 15
times in a single second shortly after the start of the test, and then
9 more times over the rest of the test. On the 32MB test, the elog()
fired a total 6 times over the course of the test. The first test got
14320 tps, while the second got 15026 tps. I find that quite
surprising, because although WAL buffer wraparound is certainly bad
(on this test, it probably brings the system completely to a halt
until fsync() finishes) it really shouldn't lock up the system for
multiple seconds at a time. And yet that's what it would need to be
doing to account for the tps discrepancy on this test, considering how
rarely it occurs.

Studying AdvanceXLInsertBuffer() a bit more, I'm wondering if the
problem isn't so much - or isn't only - that it's expensive to write
buffers while also holding WALInsertLock. Maybe it's too expensive
even to acquire WalWriteLock in the first place - that is, the real
problem isn't so much the wraparound condition itself, but the expense
of testing whether a possible wraparound has actually occurred. A
quick test suggests that we acquire WALWriteLock here much more often
than we actually write anything while holding it, and that we get a
big burst of WALWriteLock acquisitions here immediately after a
checkpoint. I don't have any proof that this is what's causing the
tps drop with smaller wal_buffers, but I think there has to be
something other than an actual wraparound condition causing problems
here, because that just doesn't seem to happen frequently enough to be
an issue.

Anyway, maybe none of that matters at the moment. Perhaps it's enough
to know that wal_buffers>16MB can help, and just bump up the maximum
auto-tuned value a bit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2012-03-12 14:42:24 Re: [v9.2] Add GUC sepgsql.client_label
Previous Message Fujii Masao 2012-03-12 13:30:45 Re: Bug: walsender and high CPU usage