Reducing contention for the LockMgrLock

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Reducing contention for the LockMgrLock
Date: 2005-12-07 21:59:14
Message-ID: 4037.1133992754@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We've suspected for awhile that once we'd fixed the buffer manager's use
of a single global BufMgrLock, the next contention hotspot would be the
lock manager's LockMgrLock. I've now seen actual evidence of that in
profiling pgbench: using a modified backend that counts LWLock-related
wait operations, the LockMgrLock is responsible for an order of magnitude
more blockages than the next highest LWLock:

PID 12971 lwlock LockMgrLock: shacq 0 exacq 50630 blk 3354
PID 12979 lwlock LockMgrLock: shacq 0 exacq 49706 blk 3323
PID 12976 lwlock LockMgrLock: shacq 0 exacq 50567 blk 3304
PID 12962 lwlock LockMgrLock: shacq 0 exacq 50635 blk 3278
PID 12974 lwlock LockMgrLock: shacq 0 exacq 50599 blk 3251
PID 12972 lwlock LockMgrLock: shacq 0 exacq 50204 blk 3243
PID 12973 lwlock LockMgrLock: shacq 0 exacq 50321 blk 3200
PID 12978 lwlock LockMgrLock: shacq 0 exacq 50266 blk 3177
PID 12977 lwlock LockMgrLock: shacq 0 exacq 50379 blk 3148
PID 12975 lwlock LockMgrLock: shacq 0 exacq 49790 blk 3124
PID 12971 lwlock WALInsertLock: shacq 0 exacq 24022 blk 408
PID 12972 lwlock WALInsertLock: shacq 0 exacq 24021 blk 393
PID 12976 lwlock WALInsertLock: shacq 0 exacq 24017 blk 390
PID 12977 lwlock WALInsertLock: shacq 0 exacq 24021 blk 388
PID 12973 lwlock WALInsertLock: shacq 0 exacq 24018 blk 379
PID 12962 lwlock WALInsertLock: shacq 0 exacq 24024 blk 377
PID 12974 lwlock WALInsertLock: shacq 0 exacq 24016 blk 367
PID 12975 lwlock WALInsertLock: shacq 0 exacq 24021 blk 366
PID 12978 lwlock WALInsertLock: shacq 0 exacq 24023 blk 354
PID 12979 lwlock WALInsertLock: shacq 0 exacq 24033 blk 321
PID 12973 lwlock ProcArrayLock: shacq 45214 exacq 6003 blk 241
PID 12971 lwlock ProcArrayLock: shacq 45355 exacq 6003 blk 225
(etc)

We had also seen evidence to this effect from OSDL:
http://archives.postgresql.org/pgsql-patches/2003-12/msg00365.php

So it seems it's time to start thinking about how to reduce contention
for the LockMgrLock. There are no interesting read-only operations on the
shared lock table, so there doesn't seem to be any traction to be gained
by making some operations take just shared access to the LockMgrLock.

The best idea I've come up with after a bit of thought is to replace the
shared lock table with N independent tables representing partitions of the
lock space. Each lock would be assigned to one of these partitions based
on, say, a hash of its LOCKTAG. I'm envisioning N of 16 or so to achieve
(hopefully) about an order-of-magnitude reduction of contention. There
would be a separate LWLock guarding each partition; the LWLock for a given
partition would be considered to protect the LOCK objects assigned to that
partition, all the PROCLOCK objects associated with each such LOCK, and
the shared-memory hash tables holding these objects (each partition would
need its own hash tables). A PGPROC's lock-related fields are only
interesting when it is waiting for a lock, so we could say that the
LWLock for the partition containing the lock it is waiting for must be
held to examine/change these fields.

The per-PGPROC list of all PROCLOCKs belonging to that PGPROC is a bit
tricky to handle since it necessarily spans across partitions. We might
be able to deal with this with suitable rules about when the list can be
touched, but I've not worked this out in detail. Another possibility is
to break this list apart into N lists, one per partition, but that would
bloat the PGPROC array a bit, especially if we wanted larger N.

The basic LockAcquire and LockRelease operations would only need to
acquire the LWLock for the partition containing the lock they are
interested in; this is what gives us the contention reduction.
LockReleaseAll is also interesting from a performance point of view,
since it executes at every transaction exit. If we divide PGPROC's
PROCLOCK list into N lists then it will be very easy for LockReleaseAll
to take only the partition locks it actually needs to release these locks;
if not, we might have to resort to scanning the list N times, once while
we hold the LWLock for each partition.

I think that CheckDeadLock will probably require taking all the partition
LWLocks (as long as it does this in a predetermined order there is no risk
of deadlock on the partition LWLocks). But one hopes this is not a
performance-critical operation. Ditto for GetLockStatusData.

One objection I can see to this idea is that having N lock hash tables
instead of one will eat a larger amount of shared memory in hashtable
overhead. But the lock hashtables are fairly small relative to the
shared buffer array (given typical configuration parameters) so this
doesn't seem like a major problem.

Another objection is that LockReleaseAll will get slower (since it will
certainly call LWLockAcquire/Release more times) and in situations that
aren't heavily concurrent there won't be any compensating gain. I think
this won't be a significant effect, but there's probably no way to tell
for sure without actually writing the code and testing it.

While at it, I'm inclined to get rid of the current assumption that there
are logically separate hash tables for different LockMethodIds. AFAICS all
that's doing for us is creating a level of confusion; there's nothing on
the horizon suggesting we'd ever actually make use of the flexibility.

Thoughts, better ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonah H. Harris 2005-12-07 22:19:29 Re: Reducing contention for the LockMgrLock
Previous Message Stephan Szabo 2005-12-07 21:50:59 Re: Foreign key trigger timing bug?