I am currently trying to understand what looks like really bad scalability of
9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but only
marginal amounts of additional load seem to push it to 70% and the application
becomes highly unresponsive.
My current understanding basically matches the issues being addressed by various
9.2 improvements, well summarized in
An additional aspect is that, in order to address the latent risk of data loss &
corruption with WBCs and async replication, we have deliberately moved the db
from a similar system with WB cached storage to ssd based storage without a WBC,
which, by design, has (in the best WBC case) approx. 100x higher latencies, but
much higher sustained throughput.
On the new system, even with 30% user "acceptable" load, oprofile makes apparent
significant lock contention:
opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres
Profiling through timer interrupt
samples % image name symbol name
30240 27.9720 postgres s_lock
5069 4.6888 postgres GetSnapshotData
3743 3.4623 postgres AllocSetAlloc
3167 2.9295 libc-2.12.so strcoll_l
2662 2.4624 postgres SearchCatCache
2495 2.3079 postgres hash_search_with_hash_value
2143 1.9823 postgres nocachegetattr
1860 1.7205 postgres LWLockAcquire
1642 1.5189 postgres base_yyparse
1604 1.4837 libc-2.12.so __strcmp_sse42
1543 1.4273 libc-2.12.so __strlen_sse42
1156 1.0693 libc-2.12.so memcpy
Unfortunately I don't have profiling data for the high-load / contention
condition yet, but I fear the picture will be worse and pointing in the same
In particular, the _impression_ is that lock contention could also be related to
I/O latencies making me fear that cases could exist where spin locks are being
helt while blocking on IO.
Looking at the code, it appears to me that the roll-your-own s_lock code cannot
handle a couple of cases, for instance it will also spin when the lock holder is
not running at all or blocking on IO (which could even be implicit, e.g. for a
page flush). These issues have long been addressed by adaptive mutexes and futexes.
Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
having spun for long (not not blocked), spin even longer in future), which
appears to me to have the potential of becoming highly counter-productive.
Now that the scene is set, here's the simple question: Why all this? Why not
simply use posix mutexes which, on modern platforms, will map to efficient
implementations like adaptive mutexes or futexes?
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2012-06-26 17:35:29|
|Subject: Re: PATCH: Improve DROP FUNCTION hint|
|Previous:||From: Alvaro Herrera||Date: 2012-06-26 16:49:52|
|Subject: Re: [PATCH] lock_timeout and common SIGALRM framework|