Re: Condition variable live lock

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Condition variable live lock
Date: 2018-01-05 06:10:57
Message-ID: 21761.1515132657@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Fri, Jan 5, 2018 at 5:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Now, the limitation with this is that we can't be waiting for any *other*
>> condition variable, because then we'd be trashing our state about that
>> variable. As coded, we can't be waiting for the target CV either, but
>> that case could actually be handled if we needed to, as per the comment.
>> I do not know if this is likely to be a problematic limitation
>> ... discuss. (The patch does survive check-world, FWIW.)

> ... one way to lift the restriction would
> be to teach ConditionVariableBroadcast() to call
> ConditionVariableCancelSleep() if cv_sleep_target is non-NULL where
> you have the current assertion. Code that is still waiting for a CV
> must be in a loop that will eventually re-add it in
> ConditionVariableSleep(), and it won't miss any signals that it can't
> afford to miss because the first call to ConditionVariableSleep() will
> return immediately so the caller will recheck its condition.

Oh, of course, very simple.

I thought of another possible issue, though. In the situation where
someone else has removed our sentinel (presumably, by issuing
ConditionVariableSignal just before we were about to remove the
sentinel), my patch assumes we can just do nothing. But it seems
like that amounts to losing one signal. Whoever the someone else
was probably expected to awaken a waiter, and now that won't happen.
Should we rejigger the logic so that it awakens one additional waiter
(if there is one) after detecting that someone else has removed the
sentinel? Obviously, this trades a risk of loss of wakeup for a risk
of spurious wakeup, but presumably the latter is something we can
cope with.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-01-05 06:22:59 Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers
Previous Message Vaishnavi Prabakaran 2018-01-05 06:00:23 Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall