Skip site navigation (1) Skip section navigation (2)

why roll-your-own s_lock? / improving scalability

From: Nils Goroll <slink(at)schokola(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: why roll-your-own s_lock? / improving scalability
Date: 2012-06-26 17:02:31
Message-ID: (view raw or flat)
Lists: pgsql-hackers

I am currently trying to understand what looks like really bad scalability of
9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but only
marginal amounts of additional load seem to push it to 70% and the application
becomes highly unresponsive.

My current understanding basically matches the issues being addressed by various
9.2 improvements, well summarized in

An additional aspect is that, in order to address the latent risk of data loss &
corruption with WBCs and async replication, we have deliberately moved the db
from a similar system with WB cached storage to ssd based storage without a WBC,
which, by design, has (in the best WBC case) approx. 100x higher latencies, but
much higher sustained throughput.

On the new system, even with 30% user "acceptable" load, oprofile makes apparent
significant lock contention:

opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres

Profiling through timer interrupt
samples  %        image name               symbol name
30240    27.9720  postgres                 s_lock
5069      4.6888  postgres                 GetSnapshotData
3743      3.4623  postgres                 AllocSetAlloc
3167      2.9295             strcoll_l
2662      2.4624  postgres                 SearchCatCache
2495      2.3079  postgres                 hash_search_with_hash_value
2143      1.9823  postgres                 nocachegetattr
1860      1.7205  postgres                 LWLockAcquire
1642      1.5189  postgres                 base_yyparse
1604      1.4837             __strcmp_sse42
1543      1.4273             __strlen_sse42
1156      1.0693             memcpy

Unfortunately I don't have profiling data for the high-load / contention
condition yet, but I fear the picture will be worse and pointing in the same

<pure speculation>
In particular, the _impression_ is that lock contention could also be related to
I/O latencies making me fear that cases could exist where spin locks are being
helt while blocking on IO.
</pure speculation>

Looking at the code, it appears to me that the roll-your-own s_lock code cannot
handle a couple of cases, for instance it will also spin when the lock holder is
not running at all or blocking on IO (which could even be implicit, e.g. for a
page flush). These issues have long been addressed by adaptive mutexes and futexes.

Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
having spun for long (not not blocked), spin even longer in future), which
appears to me to have the potential of becoming highly counter-productive.

Now that the scene is set, here's the simple question: Why all this? Why not
simply use posix mutexes which, on modern platforms, will map to efficient
implementations like adaptive mutexes or futexes?

Thanks, Nils


pgsql-hackers by date

Next:From: Robert HaasDate: 2012-06-26 17:35:29
Subject: Re: PATCH: Improve DROP FUNCTION hint
Previous:From: Alvaro HerreraDate: 2012-06-26 16:49:52
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group