Quick Links

Rewrite sinval messaging to reduce contention

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-patches(at)postgreSQL(dot)org
Subject:	Rewrite sinval messaging to reduce contention
Date:	2008-06-18 18:19:34
Message-ID:	373.1213813174@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

In this thread back in January
http://archives.postgresql.org/pgsql-performance/2008-01/msg00001.php
we saw evidence that contention for SInvalLock was a problem,
and we (or at least I) concluded that the sinval messaging system
needed a nontrivial rewrite to get rid of its habit of sending everyone
to read the queue at the same time. Here is the proposed rewrite.

The major changes are:

* When the queue overflows, we only issue a cache reset to the specific
backend or backends that still haven't read the oldest message, rather
than resetting everyone as in the original coding. The reset-everyone
approach forced unnecessary cache-rebuilding work in backends that
weren't actually behind.

* When we observe backend(s) falling well behind, we signal SIGUSR1
to only one backend, the one that is furthest behind and doesn't
already have a signal outstanding for it. When it finishes catching
up, it will "pass on" the SIGUSR1 to the next-furthest-back guy,
if there is one that is far enough behind to justify a signal.
This changes the former behavior where every backend in the system
might try to hit the queue at the same time into a "daisy chain"
where only one backend is catching up at a time; furthermore,
backends that aren't well behind aren't forced to catch up too.

* We don't attempt to clean out dead messages after every
message-receipt operation; rather, we do it on the insertion side,
and only when the queue fullness passes certain thresholds.
(As presented, the thresholds are 8/16, 9/16, 10/16, etc, but this
could be adjusted by changing a couple of #defines.) This greatly
reduces the amount of contention for exclusive hold on SInvalLock.

* Readers attempt to pull multiple messages (as presented, up to 32)
out of the queue for each acquisition of SInvalLock. Although they
take the lock in shared mode and thus aren't contending at the
LWLock level, I thought this was important to do to reduce contention
for the lock's spinlock --- the prevalence of s_lock() in the "before"
oprofile convinces me that there was just too much traffic through
the lock itself.

Note: PMSIGNAL_WAKEN_CHILDREN is now unused and should be removed,
but I didn't include that change in the patch.

The test case I'm using looks like this:

$ cat createdrop.sql
create temp table register_reqs(full_key integer, register_index integer) WITHOUT OIDS ON COMMIT DROP;
$ pgbench -n -f createdrop.sql -c 40 -t 1000 bench

On my machine CVS HEAD gets about 485 transactions/sec on this; with the
patch that improves to 700 tps.

oprofile says that the routines above 1% of CPU in CVS HEAD are

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % image name symbol name
363116 11.2433 postgres CatalogCacheIdInvalidate
316608 9.8032 postgres s_lock
305881 9.4711 postgres LWLockAcquire
272717 8.4442 postgres LWLockRelease
234812 7.2706 postgres CatalogCacheFlushRelation
215445 6.6709 postgres hash_search_with_hash_value
78544 2.4320 postgres _bt_compare
62007 1.9199 postgres SIGetDataEntry
48614 1.5052 postgres AllocSetAlloc
48033 1.4873 postgres XLogInsert
41280 1.2782 postgres PinBuffer
40659 1.2589 postgres LocalExecuteInvalidationMessage
36906 1.1427 postgres ResetCatalogCache
35708 1.1056 postgres hash_any
34151 1.0574 postgres PrepareToInvalidateCacheTuple

and with the patch

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % image name symbol name
550342 13.5424 postgres CatalogCacheIdInvalidate
362684 8.9246 postgres hash_search_with_hash_value
199291 4.9040 postgres LWLockAcquire
198277 4.8790 postgres CatalogCacheFlushRelation
149716 3.6841 postgres _bt_compare
114006 2.8054 postgres LWLockRelease
103138 2.5379 postgres XLogInsert
84471 2.0786 postgres PrepareToInvalidateCacheTuple
78464 1.9308 postgres LocalExecuteInvalidationMessage
75336 1.8538 postgres PinBuffer
74837 1.8415 postgres AllocSetAlloc
63764 1.5691 postgres SIGetDataEntries
60468 1.4879 postgres hash_any
51662 1.2713 libc-2.7.so memcpy
43999 1.0827 libc-2.7.so memcmp
43407 1.0681 postgres _bt_checkpage

so this is quite successful at eliminating spinlock-level contention
and does a reasonable job at reducing the amount of LWLock processing.

Comments? Does anyone have more realistic test cases for this problem,
or want to fiddle with the tuning #defines in sinvaladt.c?

regards, tom lane

Attachment	Content-Type	Size
sinval-rewrite-1.patch.gz	application/octet-stream	8.8 KB

Responses

Re: Rewrite sinval messaging to reduce contention at 2008-06-19 02:37:19 from Tom Lane

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Bruce Momjian	2008-06-18 18:43:42	Re: Simplify formatting.c
Previous Message	Neil Conway	2008-06-18 17:44:36	Re: small typo in DTrace docs