Re: Move PinBuffer and UnpinBuffer to atomics

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, YUriy Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Move PinBuffer and UnpinBuffer to atomics
Date: 2015-12-11 12:56:46
Message-ID: CAPpHfdugN7DGKaMgfsKMJxyownyqiwvc94RYN8qEXw2itD35gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 10, 2015 at 9:26 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Wed, Dec 9, 2015 at 2:17 PM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>> On Tue, Dec 8, 2015 at 6:00 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>>
>>> On Tue, Dec 8, 2015 at 3:56 PM, Alexander Korotkov <
>>> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>>>>
>>>> ​Agree. This patch need to be carefully verified. Current experiments
>>>> just show that it is promising direction for improvement. I'll come with
>>>> better version of this patch.
>>>>
>>>> Also, after testing on large machines I have another observation to
>>>> share. For now, LWLock doesn't guarantee that exclusive lock would be ever
>>>> acquired (assuming each shared lock duration is finite). It because when
>>>> there is no exclusive lock, new shared locks aren't queued and LWLock state
>>>> is changed directly. Thus, process which tries to acquire exclusive lock
>>>> have to wait for gap in shared locks.
>>>>
>>>
>>> I think this has the potential to starve exclusive lockers in worst case.
>>>
>>>
>>>> But with high concurrency for shared lock that could happen very rare,
>>>> say never.
>>>>
>>>> We did see this on big Intel machine in practice. pgbench -S gets
>>>> shared ProcArrayLock very frequently. Since some number of connections is
>>>> achieved, new connections hangs on getting exclusive ProcArrayLock. I think
>>>> we could do some workaround for this problem. For instance, when exclusive
>>>> lock waiter have some timeout it could set some special bit which prevents
>>>> others to get new shared locks.
>>>>
>>>>
>>> I think timeout based solution would lead to giving priority to
>>> exclusive lock waiters (assume a case where each of exclusive
>>> lock waiter timesout one after another) and make shared lockers
>>> wait and a timer based solution might turn out to be costly for
>>> general cases where wait is not so long.
>>>
>>
>> ​Since all lwlock waiters are ordered in the queue, we can let only first
>> waiter to set this bit.​
>>
>
> Thats okay, but still every time an Exclusive locker woke up, the
> threshold time for its wait might be already over and it will set the
> bit. In theory, that looks okay, but as compare to current algorithm
> it will make more shared lockers to be added into wait queue.
>
>
>> Anyway, once bit is set, shared lockers would be added to the queue. They
>> would get the lock in queue order.
>>
>>
>
> Ye thats right, but I think in general the solution to this problem
> should be don't let any Exclusive locker to starve and still allow
> as many shared lockers as possible. I think here it is important
> how we define starving, should it be based on time or something
> else? I find timer based solution somewhat less suitable, but may
> be it is okay, if there is no other better way.
>

​Yes, we probably should find something better.​

Another way could be to
>>> check if the Exclusive locker needs to go for repeated wait for a
>>> couple of times, then we can set such a bit.
>>>
>>
>> ​I'm not sure what do you mean by repeated wait. Do you mean exclusive
>> locker was waked twice up by timeout?
>>
>
> I mean to say once the Exclusive locker is woken up, it again
> re-tries to acquire the lock as it does today, but if it finds that the
> number of retries is greater than certain threshold (let us say 10),
> then we sit the bit.
>

​Yes, there is a cycle with retries in LWLockAcquire function. The case of
retry is when ​waiter is waked up, but someone other steal the lock before
him. Lock waiter is waked up by lock releaser only when lock becomes free.
But in the case of high concurrency for shared lock, it almost never
becomes free. So, exclusive locker would be never waked up. I'm pretty sure
this happens on big Intel machine while we do the benchmark. So, relying on
number of retries wouldn't work in this case.
I'll do the tests to verify if retries happens in our case.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-12-11 13:04:13 Re: Move PinBuffer and UnpinBuffer to atomics
Previous Message Michael Paquier 2015-12-11 12:34:34 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.