Re: Is the unfair lwlock behavior intended?

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is the unfair lwlock behavior intended?
Date: 2016-05-25 01:49:51
Message-ID: 0A3221C70F24FB45833433255569204D1F579BFD@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Alexander Korotkov
I've already observed such behavior, see [1]. I think that now there is no consensus on how to fix that. For instance, Andres express opinion that this shouldn't be fixed from LWLock side [2].

Thank you for nice pointers. I understood.

> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Ants Aasma
> 9.5 had significant LWLock scalability improvements. This might
> improve performance enough so that exclusive lockers don't get
> completely starved. It would be helpful if you could test if it's
> still possible to trigger starvation with the new code.

Unfortunately, we cannot test anymore because the customer's system is now in production. The heavy ProcArray contention was caused mainly by too many tuple visibility tests, which in turn were caused by unintended sequential scans. Then the customer avoided the contention problem by adding an index and reducing the number of concurrent active sessions.

> From: Andres Freund [mailto:andres(at)anarazel(dot)de]
> Are you sure you're actually queued behind share locks, and not primarily
> behind the lwlock's spinlocks? The latter is what I've seen in similar cases.

I think so, because the stack trace showed that the backends were waiting in TransactionIsInProgress (or some function in the commit processing) -> LWLockAcquire -> PGSemaphoreLock -> semop(), not including spinlock-related functions.

> The problem is that half-way fair locks, which are frequently acquired both
> in shared and exclusive mode, have really bad throughput characteristics
> on modern multi-socket systems. We mostly get away with fair locking on
> object level (after considerable work re fast-path locking), because nearly
> all access are non-conflicting. But prohibiting any snapshot acquisitions
> when there's a single LW_EXCLUSIVE ProcArrayLock waiter, can reduce
> throughput dramatically.

Thanks, I understood that you chose total throughput over stable response time. I feel empathetic with the decision, and I think it's the way to go.

OTOH, maybe I'll object if I'm the pitiful waiter... I'll get out of the Disneyland if their staff said "Please stay in the line as long as there are efficient guests behind you. That's the benefit for the whole Disneyland."

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-05-25 01:52:11 Re: Inheritance
Previous Message Stephen Frost 2016-05-25 01:43:16 Re: Parallel safety tagging of extension functions