Re: Is the unfair lwlock behavior intended?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is the unfair lwlock behavior intended?
Date: 2016-05-24 22:50:15
Message-ID: 20160524225015.4zj5hkdizuzedcjj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-05-24 15:34:31 -0700, Peter Geoghegan wrote:
> On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> wrote:
> >> I've already observed such behavior, see [1]. I think that now there is no
> >> consensus on how to fix that. For instance, Andres express opinion that
> >> this shouldn't be fixed from LWLock side [2].
> >> FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix
> >> various scalability issues including high ProcArrayLock contention.
> >
> > Some amount of non-fairness is ok, but degrading to the point of
> > complete denial of service is not very graceful. I don't think it's
> > realistic to hope that all lwlock contention issues will be fixed any
> > time soon. Some fallback mechanism would be extremely nice until then.
>
> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:
>
> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf
>
> I could believe that there's a case to be made for per-LWLock fairness
> characteristics, which may be roughly what Andres meant.

The problem is that half-way fair locks, which are frequently acquired
both in shared and exclusive mode, have really bad throughput
characteristics on modern multi-socket systems. We mostly get away with
fair locking on object level (after considerable work re fast-path
locking), because nearly all access are non-conflicting. But
prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
ProcArrayLock waiter, can reduce throughput dramatically.

I don't think randomly processing the wait queue - which is what the
quoted paper essentially describes - is really useful here. We
intentionally *ignore* the wait queue entirely if a lock is not
conflicting, and that's what can prohibit exclusive locks from ever
succeeding, because you essentially can get repetitions of:

S1: acq(SHARED) -> shared = 1
S2: acq(EXCLUSIVE) -> shared = 1, waiters = 1 <block>
...
S3: acq(SHARED) -> shared = 2
S1: rel(SHARED) -> shared = 1
S1: acq(SHARED) -> shared = 2
S3: rel(SHARED) -> shared = 1
...

Now we potentially could mark individual lwlocks as being fair
locks. But which ones would those be? Certainly not ProcArrayLock, it's
way too heavily contended.

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-05-24 22:52:19 Re: statistics for shared catalogs not updated when autovacuum is off
Previous Message Peter Geoghegan 2016-05-24 22:34:31 Re: Is the unfair lwlock behavior intended?