From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Escaping from blocked send() reprised. |
Date: | 2014-10-02 09:02:50 |
Message-ID: | 20141002090250.GF7158@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2014-10-02 17:47:39 +0900, Kyotaro HORIGUCHI wrote:
> Hello,
>
> > > I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> > > weaker form of ImmediateInterruptOK that only allows handling a pending
> > > die-signal in the signal handler.
> > >
> > > Robert, others, do you see a problem with this?
> >
> > Per se I don't have a problem with it. There does exist the problem that
> > the user doesn't get a error message in more cases though. On the other
> > hand it's bad if any user can prevent the database from restarting.
> >
> > > Over IM, Robert pointed out that it's not safe to jump out of a signal
> > > handler with siglongjmp, when we're inside library calls, like in a callback
> > > called by OpenSSL. But even with current master branch, that's exactly what
> > > we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> > > that any incoming signal will be handled directly in the signal handler,
> > > which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> > > if control never returns to the SSL_read() or SSL_write() function that
> > > called secure_raw_read().
> >
> > But this is imo prohibitive. Yes, we're doing it for a long while. But
> > no, that's not ok. It actually prompoted me into prototyping the latch
> > thing (in some other thread). I don't think existing practice justifies
> > expanding it further.
>
> I see, in that case, this approach seems basically
> applicable. But if I understand correctly, this patch seems not
> to return out of the openssl code even when latch was found to be
> set in secure_raw_write/read.
Correct. That's why I think it's the way forward. There's several
problems now where the inability to do real things while reading/writing
is a problem.
> I tried setting errno = ECONNRESET
> and it went well but seems a bad deed.
Where and why did you do that?
> secure_raw_write(Port *port, const void *ptr, size_t len)
> {
> n = send(port->sock, ptr, len, 0);
>
> if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
> {
> w = WaitLatchOrSocket(&MyProc->procLatch, ...
>
> if (w & WL_LATCH_SET)
> {
> ResetLatch(&MyProc->procLatch);
> /*
> * Force a return, so interrupts can be processed when not
> * (possibly) underneath a ssl library.
> */
> errno = EINTR;
> (return n; // n is negative)
>
>
> my_sock_write(BIO *h, const char *buf, int size)
> {
> res = secure_raw_write(((Port *) h->ptr), buf, size);
> BIO_clear_retry_flags(h);
> if (res <= 0)
> {
> if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
> {
> BIO_set_retry_write(h);
Hm, this seems, besides one comment, the code from the last patch in my
series. Do you have a particular question about it?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-10-02 09:30:02 | Re: Yet another abort-early plan disaster on 9.3 |
Previous Message | Heikki Linnakangas | 2014-10-02 08:49:31 | Re: Replication identifiers, take 3 |