Re: Escaping from blocked send() reprised.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Escaping from blocked send() reprised.
Date: 2014-06-30 15:27:47
Message-ID: CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 30, 2014 at 4:13 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, I have received inquiries related to blocked communication
> several times for these weeks with different symptoms. Then I
> found this message from archive,
>
> http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html
>
>> Subject: Escaping a blocked sendto() syscall without causing a restart
>
> Mr. Tom Lane gave a comment replying it,
>
>> Offhand it looks to me like most signals would kick the backend off the
>> send() call ... but it would loop right back and try again. See
>> internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis
>> may or may not apply.)
>>
>> We can't do anything except repeat the send attempt if the client
>> connection is to be kept in a sane state.
> (snipped)
>> And I'm not at all sure if we could get it to work in SSL mode...
>
> That's true for timeouts that should continue the connection,
> say, statement_timeout, but focusing on intentional backend
> termination, I think it does no harm to break it up abruptly,
> even if it was on SSL. On the other hand it seems still
> preferable to keep a connection when not blocked. The following
> expression would detects such a blocking state at just before
> next send(2) after the previous try exited by signals.
>
> (ProcDiePending && select(1, NULL, fd, NULL, '1 sec') == 0)
>
> Finally, pg_terminate_backend() works even when send is blocked
> for both SSL and non-SSL connections after 1 second delay with
> this patch (break_socket_blocking_on_termination_v1.patch).
>
> Nevetheless, of course statement_timeout cannot become effective
> by this method since it breaks the consistency in the client
> protocol. It needs change in client protocol to have "out of
> band" mechanism or something, maybe.
>
> Any suggestions?

You should probably add your patch here, so it doesn't get forgotten about:

https://commitfest.postgresql.org/action/commitfest_view/open

We're focused on reviewing patches for the current CommitFest, so your
patch might not get attention right away. A couple of general
thoughts on this topic:

1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.

2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Behn, Edward (EBEHN) 2014-06-30 15:37:20 Re: Array of composite types returned from python
Previous Message David Fetter 2014-06-30 15:22:07 Re: delta relations in AFTER triggers