Further reduction of bufmgr lock contention

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: "Gavin Hamill" <gdh(at)acentral(dot)co(dot)uk>
Subject: Further reduction of bufmgr lock contention
Date: 2006-04-21 17:01:36
Message-ID: 12051.1145638896@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been looking into Gavin Hamill's recent report of poor performance
with PG 8.1 on an 8-way IBM PPC64 box. strace'ing backends shows a lot
of semop() calls, indicating blocking at the LWLock or lmgr-lock levels,
but not a lot of select() delays, suggesting we don't have too much of a
problem at the hardware spinlock level. A typical breakdown of
different kernel call types is

566 _llseek
10 brk
10 gettimeofday
4 mmap
4 munmap
562 read
4 recv
8 select
3014 semop
12 send
1 time
3 write

(I'm hoping to get some oprofile results to confirm there's nothing
strange going on at the hardware level, but no luck yet on getting
oprofile to work on Debian/PPC64 ... anyone know anything about suitable
kernels to use for that?)

Instrumenting LWLockAcquire (with a patch I had developed last fall,
but just now got around to cleaning up and committing to CVS) shows
that the contention is practically all for the BufMappingLock:

$ grep ^PID postmaster.log | sort +9nr | head -20
PID 23820 lwlock 0: shacq 2446470 exacq 6154 blk 12755
PID 23823 lwlock 0: shacq 2387597 exacq 4297 blk 9255
PID 23824 lwlock 0: shacq 1678694 exacq 4433 blk 8692
PID 23826 lwlock 0: shacq 1221221 exacq 3224 blk 5893
PID 23821 lwlock 0: shacq 1892453 exacq 1665 blk 5766
PID 23835 lwlock 0: shacq 2390685 exacq 1453 blk 5511
PID 23822 lwlock 0: shacq 1669419 exacq 1615 blk 4926
PID 23830 lwlock 0: shacq 1039468 exacq 1248 blk 2946
PID 23832 lwlock 0: shacq 698622 exacq 397 blk 1818
PID 23836 lwlock 0: shacq 544472 exacq 530 blk 1300
PID 23839 lwlock 0: shacq 497505 exacq 46 blk 885
PID 23842 lwlock 0: shacq 305281 exacq 1 blk 720
PID 23840 lwlock 0: shacq 317554 exacq 226 blk 355
PID 23840 lwlock 2: shacq 0 exacq 2872 blk 7
PID 23835 lwlock 2: shacq 0 exacq 3434 blk 6
PID 23835 lwlock 1: shacq 0 exacq 1452 blk 4
PID 23822 lwlock 1: shacq 0 exacq 1614 blk 3
PID 23820 lwlock 2: shacq 0 exacq 3582 blk 2
PID 23821 lwlock 1: shacq 0 exacq 1664 blk 2
PID 23830 lwlock 1: shacq 0 exacq 1247 blk 2

These numbers show that our rewrite of the bufmgr has done a great job
of cutting down the amount of potential contention --- most of the
traffic on this lock is shared rather than exclusive acquisitions ---
but it seems that if you have enough CPUs it's still not good enough.
(My best theory as to why Gavin is seeing better performance from a
dual Opteron is simply that 2 processors will have 1/4th as much
contention as 8 processors.)

I have an idea about how to improve matters: I think we could break the
buffer tag to buffer mapping hashtable into multiple partitions based on
some hash value of the buffer tags, and protect each partition under a
separate LWLock, similar to what we did with the lmgr lock table not
long ago. Anyone have a comment on this strategy, or a better idea?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-04-21 17:15:41 Re: TODO item question [pg_hba.conf]
Previous Message Alvaro Herrera 2006-04-21 16:26:00 Re: TODO item question [pg_hba.conf]