Re: Escaping from blocked send() reprised.

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Escaping from blocked send() reprised.
Date: 2014-10-02 09:02:50
Message-ID: 20141002090250.GF7158@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-02 17:47:39 +0900, Kyotaro HORIGUCHI wrote:
> Hello,
>
> > > I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> > > weaker form of ImmediateInterruptOK that only allows handling a pending
> > > die-signal in the signal handler.
> > >
> > > Robert, others, do you see a problem with this?
> >
> > Per se I don't have a problem with it. There does exist the problem that
> > the user doesn't get a error message in more cases though. On the other
> > hand it's bad if any user can prevent the database from restarting.
> >
> > > Over IM, Robert pointed out that it's not safe to jump out of a signal
> > > handler with siglongjmp, when we're inside library calls, like in a callback
> > > called by OpenSSL. But even with current master branch, that's exactly what
> > > we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> > > that any incoming signal will be handled directly in the signal handler,
> > > which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> > > if control never returns to the SSL_read() or SSL_write() function that
> > > called secure_raw_read().
> >
> > But this is imo prohibitive. Yes, we're doing it for a long while. But
> > no, that's not ok. It actually prompoted me into prototyping the latch
> > thing (in some other thread). I don't think existing practice justifies
> > expanding it further.
>
> I see, in that case, this approach seems basically
> applicable. But if I understand correctly, this patch seems not
> to return out of the openssl code even when latch was found to be
> set in secure_raw_write/read.

Correct. That's why I think it's the way forward. There's several
problems now where the inability to do real things while reading/writing
is a problem.

> I tried setting errno = ECONNRESET
> and it went well but seems a bad deed.

Where and why did you do that?

> secure_raw_write(Port *port, const void *ptr, size_t len)
> {
> n = send(port->sock, ptr, len, 0);
>
> if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
> {
> w = WaitLatchOrSocket(&MyProc->procLatch, ...
>
> if (w & WL_LATCH_SET)
> {
> ResetLatch(&MyProc->procLatch);
> /*
> * Force a return, so interrupts can be processed when not
> * (possibly) underneath a ssl library.
> */
> errno = EINTR;
> (return n; // n is negative)
>
>
> my_sock_write(BIO *h, const char *buf, int size)
> {
> res = secure_raw_write(((Port *) h->ptr), buf, size);
> BIO_clear_retry_flags(h);
> if (res <= 0)
> {
> if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
> {
> BIO_set_retry_write(h);

Hm, this seems, besides one comment, the code from the last patch in my
series. Do you have a particular question about it?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-10-02 09:30:02 Re: Yet another abort-early plan disaster on 9.3
Previous Message Heikki Linnakangas 2014-10-02 08:49:31 Re: Replication identifiers, take 3