Re: Hot Standby conflict resolution handling

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 16:03:19
Message-ID: 20130117160319.GA22844@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-01-17 10:19:23 -0500, Tom Lane wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
> >> cannot safely attempt to send an error message to the client either;
> >> but ereport(FATAL) will try exactly that.
>
> > I thought since FATAL will force the backend to exit, we don't care much
> > about corrupted OpenSSL state. I even thought that's why we raise ERROR to
> > FATAL so that the backend can start in a clean state. But clearly I'm
> > missing a point here because you don't think that way.
>
> If we were to simply exit(1), leaving the kernel to close the client
> socket, it'd be safe enough because control would never have returned to
> OpenSSL. But this code doesn't do that. What we're looking at is that
> we've interrupted OpenSSL at some arbitrary point, and now we're going
> to make fresh calls to it to try to pump the FATAL error message out to
> the client. It seems fairly unlikely that that's safe. I'm not sure
> I credit Andres' worry of arbitrary code execution, but I do fear that
> OpenSSL could get confused to the point of freezing up, or even more
> likely that it would transmit garbage to the client, which rather
> defeats the purpose.

I don't think its likely either, I seem to remember it copying arround
function pointers though, so it seems possible with some bad luck.

> Don't see a nice fix. The COMMERROR approach (ie, don't try to send
> anything to the client, only the log) is not nice at all since the
> client would get the impression that the server crashed. On the other
> hand, anything else requires waiting till we get control back from
> OpenSSL, which might be a long time, and meanwhile we're still holding
> locks that prevent WAL recovery from proceeding.

I think we can make openssl return pretty much immediately if we assume
recv() can reliably interrupted by signals, possibly by setting the
socket to nonblocking in the signal handler.
We just need to tell openssl not to retry immediately and we should be
fine. Given that quite some people use openssl with nonblocking sockets,
that code path should be reasonably safe.

That still requires ugliness around saving the error and reraising it
after returning from openssl though...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2013-01-17 16:03:55 Re: Materialized views WIP patch
Previous Message Tom Lane 2013-01-17 15:47:53 Re: could not create directory "...": File exists