Re: Introduce timeout capability for ConditionVariableSleep

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Shawn Debnath <sdn(at)amazon(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Introduce timeout capability for ConditionVariableSleep
Date: 2019-07-13 03:02:25
Message-ID: CA+hUKGJxFAEg5A0B55JXOdRetYbP0Ng3vFL2An6mL0i0yAYJKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 12, 2019 at 6:08 PM Shawn Debnath <sdn(at)amazon(dot)com> wrote:
> On Tue, Jul 09, 2019 at 11:03:18PM +1200, Thomas Munro wrote:
> > On Sun, Jul 7, 2019 at 3:09 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > > + /* Timed out */
> > > + if (rc == 0)
> > > + return true;
> >
> > Here's a version without that bit, because I don't think we need it.
>
> This works. Agree that letting it fall through covers the first gap.

Pushed, like that (with the now unused rc variable also removed).
Thanks for the patch!

> > > That still leaves the danger that the CV can be signalled some time
> > > after ConditionVariableTimedSleep() returns. So now I'm wondering if
> > > ConditionVariableCancelSleep() should signal the CV if it discovers
> > > that this process is not in the proclist, on the basis that that must
> > > indicate that we've been signalled even though we're not interested in
> > > the message anymore, and yet some other process else might be
> > > interested, and that might have been the only signal that is ever
> > > going to be delivered by ConditionVariableSignal().
> >
> > And a separate patch to do that. Thoughts?
>
> I like it. This covers the gap all the way till cancel is invoked and it
> manipulates the list to remove itself or realizes that it needs to
> forward the signal to some other process.

I pushed this too. It's a separate commit, because I think there is
at least a theoretical argument that it should be back-patched. I'm
not going to do that today though, because I doubt anyone is relying
on ConditionVariableSignal() working that reliably yet, and it's
really with timeouts that it becomes a likely problem.

I thought about this edge case because I have long wanted to propose a
pair of functions that provide a simplified payloadless blocking
alternative to NOTIFY, that would allow for just the right number of
waiting sessions to wake up to handle SKIP LOCKED-style job queues.
Otherwise you sometimes get thundering herds of wakeups fighting over
crumbs. That made me think about the case where a worker session
decides to time out and shut down due to being idle for too long, but
eats a wakeup on its way out. Another question that comes up in that
use case is whether CV wakeup queues should be LIFO or FIFO. I think
the answer is LIFO, to support class worker pool designs that
stabilise at the right size using a simple idle timeout rule. They're
currently FIFO (proclist_pop_head_node() to wake up, but
proclist_push_tail() to sleep). I understand why Robert didn't care
about that last time I mentioned it: all our uses of CVs today are
"broadcast" wakeups. But a productised version of the "poke" hack I
showed earlier that supports poking just one waiter would care about
the thing this patch fixed, and also the wakeup queue order.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-13 04:54:06 Re: Tab completion for CREATE TYPE
Previous Message Tomas Vondra 2019-07-13 01:58:05 Re: Check-out mutable functions in check constraints