Re: Escaping from blocked send() reprised.

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Escaping from blocked send() reprised.
Date: 2014-10-30 13:27:13
Message-ID: 54523CB1.7030808@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
> On 10/03/2014 05:26 PM, Andres Freund wrote:
>> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
>>> On 09/28/2014 01:54 AM, Andres Freund wrote:
>>>> 0003 Sinval/notify processing got simplified further. There really isn't
>>>> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>>>> anymore. Also begin_client_read/client_read_ended don't make much
>>>> sense anymore. Instead introduce ProcessClientReadInterrupt (which
>>>> wants a better name).
>>>> There's also a very WIP
>>>> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>>>> set. All of that happens via the latch mechanism, nothing happens
>>>> inside signal handlers. So I do think it's quite an improvement
>>>> over what's been discussed in this thread.
>>>> But it (and the other approaches) do noticeably increase the
>>>> likelihood of clients not getting the error message if the client
>>>> isn't actually dead. The likelihood of write() being blocked
>>>> *temporarily* due to normal bandwidth constraints is quite high
>>>> when you consider COPY FROM and similar. Right now we'll wait till
>>>> we can put the error message into the socket afaics.
>>>>
>>>> 1-3 need some serious comment work, but I think the approach is
>>>> basically sound. I'm much, much less sure about allowing send() to be
>>>> interrupted.
>>>
>>> Yeah, 1-3 seem sane.
>>
>> I think 3 also needs a careful look. Have you looked through it? While
>> imo much less complex than before, there's some complex interactions in
>> the touched code. And we have terrible coverage of both catchup
>> interrupts and notify stuff...
>
> I only looked at the .patch, I didn't apply it, so I didn't look at the
> context much. But I don't see any fundamental problem with it. I would
> like to have a closer look before it's committed, though.

About patch 3:

Looking closer, this design still looks OK to me. As you said yourself,
the comments need some work (e.g. the step 5. in the top comment in
async.c needs updating). And then there are a couple of FIXME and XXX
comments that need to be addressed.

The comment on PGPROC.procLatch in storage/proc.h says just this:

> Latch procLatch; /* generic latch for process */

This needs a lot more explaining. It's now used by signal handlers to
interrupt a read or write to the socket; that should be mentioned. What
else is it used for? (for waking up a backend in synchronous
replication, at least) What are the rules on when to arm it and when to
reset it?

Would it be more clear to use a separate, backend-private, latch, for
the signals? I guess that won't work, though, because sometimes we need
need to wait for a wakeup from a different process or from a signal at
the same time (SyncRepWaitForLSN() in particular). Not without adding a
variant of WaitLatch that can wait on two latches simultaneously, anyway.

The assumption in secure_raw_read that MyProc exists is pretty
surprising. I understand why it's that way, and there's a comment in
PostgresMain explaining why the socket cannot be put into non-blocking
mode earlier, but it's still a bit whacky. Not sure what to do about that.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-30 13:28:26 Re: Wait free LW_SHARED acquisition - v0.9
Previous Message Amit Kapila 2014-10-30 13:24:57 Re: Wait free LW_SHARED acquisition - v0.9