Re: Further reduction of bufmgr lock contention

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, Gavin Hamill <gdh(at)acentral(dot)co(dot)uk>
Subject: Re: Further reduction of bufmgr lock contention
Date: 2006-04-21 20:27:48
Message-ID: 1145651268.3112.95.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2006-04-21 at 13:01 -0400, Tom Lane wrote:
> I've been looking into Gavin Hamill's recent report of poor performance
> with PG 8.1 on an 8-way IBM PPC64 box.

Ah good.

> Instrumenting LWLockAcquire (with a patch I had developed last fall,
> but just now got around to cleaning up and committing to CVS) shows
> that the contention is practically all for the BufMappingLock:

> $ grep ^PID postmaster.log | sort +9nr | head -20
> PID 23820 lwlock 0: shacq 2446470 exacq 6154 blk 12755
> PID 23823 lwlock 0: shacq 2387597 exacq 4297 blk 9255
> PID 23824 lwlock 0: shacq 1678694 exacq 4433 blk 8692
> PID 23826 lwlock 0: shacq 1221221 exacq 3224 blk 5893

BufMappingLock contention can be made worse by a poorly tuned bgwriter
or if the cache hit rate is low. Perhaps in this case, increasing
shared_buffers (again) might be enough to further reduce the contention?

When we discussed this before
http://archives.postgresql.org/pgsql-hackers/2005-02/msg00702.php
ISTM then that a low shared_buffers cache hit rate combined with a high
OS cache hit rate will cause high contention in an SMP environment.

> These numbers show that our rewrite of the bufmgr has done a great job
> of cutting down the amount of potential contention --- most of the
> traffic on this lock is shared rather than exclusive acquisitions ---
> but it seems that if you have enough CPUs it's still not good enough.
> (My best theory as to why Gavin is seeing better performance from a
> dual Opteron is simply that 2 processors will have 1/4th as much
> contention as 8 processors.)

Jonah mentions some 16-way CPU testing we have just begun. There are
some interesting effects to decode, but most importantly all the CPUs do
stay at 100% for much of the time (when other tuning has been done). So
my feeling is that the BufMappingLock contention seen by Gavin is much
worse than we see. (...and I had been thinking to investigate further
with him on that point, though have just arrived back in UK).

Another difference is the amount of read/write. My understanding is that
Gavin's workload is mostly read-only which will greatly increase the
buffer request rate since backends will spend proportionally more time
consuming data and less time in xlog (etc).

My understanding is that contention increases geometrically with number
of potential lock holders (i.e. CPUs).

> I have an idea about how to improve matters: I think we could break the
> buffer tag to buffer mapping hashtable into multiple partitions based on
> some hash value of the buffer tags, and protect each partition under a
> separate LWLock, similar to what we did with the lmgr lock table not
> long ago. Anyone have a comment on this strategy, or a better idea?

I think this is the right way to go
http://archives.postgresql.org/pgsql-hackers/2005-02/msg00240.php
though the work for 8.1 was right to have been performed first.

The earlier lmgr lock partitioning had a hard-coded number of
partitions, which was sensible because of the reduced likelihood of
effectiveness beyond a certain number of partitions. That doesn't follow
here since the BufMappingLock contention will vary with the size of
shared_buffers and with the number of CPUs in use (for a given
workload). I'd like to see the partitioning calculated at server startup
either directly from shared_buffers or via a parameter. We may not be
restricted to only using a hash function as we were with lmgr, perhaps
using a simple range partitioning.

Test-wise: May be able to trial something next week, though system
access not yet confirmed and I'm not sure we'll see an improvement on
the workload we're testing on currently. I'll have a think about a pure
test that we can run on both systems to measure the contention.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-04-21 20:46:00 Re: TODO item pg_hba.conf
Previous Message Jonah H. Harris 2006-04-21 18:18:04 Re: Google SoC--Idea Request