Re: Escaping from blocked send() reprised.

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Escaping from blocked send() reprised.
Date: 2014-10-03 15:29:23
Message-ID: 542EC0D3.6030209@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/03/2014 05:26 PM, Andres Freund wrote:
> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
>> On 09/28/2014 01:54 AM, Andres Freund wrote:
>>> 0003 Sinval/notify processing got simplified further. There really isn't
>>> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>>> anymore. Also begin_client_read/client_read_ended don't make much
>>> sense anymore. Instead introduce ProcessClientReadInterrupt (which
>>> wants a better name).
>>> There's also a very WIP
>>> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>>> set. All of that happens via the latch mechanism, nothing happens
>>> inside signal handlers. So I do think it's quite an improvement
>>> over what's been discussed in this thread.
>>> But it (and the other approaches) do noticeably increase the
>>> likelihood of clients not getting the error message if the client
>>> isn't actually dead. The likelihood of write() being blocked
>>> *temporarily* due to normal bandwidth constraints is quite high
>>> when you consider COPY FROM and similar. Right now we'll wait till
>>> we can put the error message into the socket afaics.
>>>
>>> 1-3 need some serious comment work, but I think the approach is
>>> basically sound. I'm much, much less sure about allowing send() to be
>>> interrupted.
>>
>> Yeah, 1-3 seem sane.
>
> I think 3 also needs a careful look. Have you looked through it? While
> imo much less complex than before, there's some complex interactions in
> the touched code. And we have terrible coverage of both catchup
> interrupts and notify stuff...

I only looked at the .patch, I didn't apply it, so I didn't look at the
context much. But I don't see any fundamental problem with it. I would
like to have a closer look before it's committed, though.

> There's also the concern that using a latch for client communication
> increases the number of syscalls for the same work. We should at least
> try to quantify that...

I'm not too concerned about that, since we only do extra syscalls when
the socket isn't immediately available for reading/writing, i.e. when we
have to sleep anyway.

>> 4 also looks OK to me at a quick glance. It basically
>> enables handling the "die" interrupt immediately, if we're blocked in a read
>> or write. It won't be handled in the signal handler, but within the
>> secure_read/write call anyway.
>
> What are you thinking about the concern that it'll reduce the likelihood
> of transferring the error message to the client? I tried to reduce that
> by only allowing errors when write() blocks, but that's not an
> infrequent event.

I'm not too concerned about that either. I mean, it's probably true that
it reduces the likelihood, but I don't particularly care myself. But if
we care, we could use a timeout there, so that if we receive a SIGTERM
while blocked on a send(), we wait for a few seconds to see if we can
send whatever we were sending, before terminating the backend.

What should we do with this patch in the commitfest? Are you planning to
clean up and commit these patches?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marco Nenciarini 2014-10-03 15:31:45 [RFC] Incremental backup v2: add backup profile to base backup
Previous Message Alvaro Herrera 2014-10-03 15:07:23 Re: Log notice that checkpoint is to be written on shutdown