Re: pgbench vs. wait events

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench vs. wait events
Date: 2016-10-06 23:50:40
Message-ID: CAB7nPqR2_0fqRH94-sqZq0DKLQTh7yjWC3HmCkvxvKHQAKV8qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 7, 2016 at 3:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I decided to do some testing on hydra (IBM-provided community
> resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
> the newly-enhanced wait event stuff to try to get an idea of what
> we're waiting for during pgbench. I did 30-minute pgbench runs with
> various configurations, but all had max_connections = 200,
> shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
> off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
> log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
> on. During each run, I ran this psql script in another window and
> captured the output:

Nice. Thanks for sharing.

> \t
> select wait_event_type, wait_event from pg_stat_activity where pid !=
> pg_backend_pid()
> \watch 0.5
>
> Then, I used a little shell-scripting to count up the number of times
> each wait event occurred in the output.

Or an INSERT SELECT on an unlogged table? No need of extra maths then.

> First, I tried scale factor
> 3000 with 32 clients and got these results:
>
> 1 LWLockTranche | buffer_mapping
> [...]
> 21014 LWLockNamed | WALWriteLock
> 28497 |
> 58279 LWLockTranche | wal_insert
>
> tps = 1150.803133 (including connections establishing)
>
> What jumps out here is, at least to me, is that there is furious
> contention on both the wal_insert locks and on WALWriteLock.
> Apparently, the system simply can't get WAL on disk fast enough to
> keep up with this workload. Relation extension locks and
> buffer_content locks also are also pretty common, both ahead of
> ClientRead, a relatively uncommon wait event on this test. The load
> average on the system was only about 3 during this test, indicating
> that most processes are in fact spending most of their time off-CPU.

Increasing the number of WAL insert slots would help? With your tests
this is at 8 all the time.

> The first thing I tried was switching to unlogged tables, which
> produces these results:
>
> 1 BufferPin | BufferPin
> 1 LWLockTranche | lock_manager
> 2 LWLockTranche | buffer_mapping
> 8 LWLockNamed | ProcArrayLock
> 9 LWLockNamed | CheckpointerCommLock
> 9 LWLockNamed | CLogControlLock
> 11 LWLockTranche | buffer_content
> 37 LWLockTranche | clog
> 153 Lock | tuple
> 388 LWLockNamed | XidGenLock
> 827 Lock | transactionid
> 1267 Client | ClientRead
> 20631 Lock | extend
> 91767 |
>
> tps = 1223.239416 (including connections establishing)
>
> If you don't look at the TPS number, these results look like a vast
> improvement. The overall amount of time spent not waiting for
> anything is now much higher, and the problematic locks have largely
> disappeared from the picture. However, the load average now shoots up
> to about 30, because most of the time that the backends are "not
> waiting for anything" they are in fact in kernel wait state D; that
> is, they're stuck doing I/O. This suggests that we might want to
> consider advertising a wait state when a backend is doing I/O, so we
> can measure this sort of thing.

Maybe something in fd.c.. It may be a good idea to actually have a
look at a perf output on Linux to see where this contention is
happening, use this conclusion to decide the place of a wait point,
and then see if on other OSes share a similar pattern.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2016-10-07 00:40:35 Re: VACUUM's ancillary tasks
Previous Message Michael Paquier 2016-10-06 23:09:44 Re: amcheck (B-Tree integrity checking tool)