Re: Hot Standby conflict resolution handling

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby conflict resolution handling
Date: 2013-01-17 15:19:23
Message-ID: 8463.1358435963@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
>> cannot safely attempt to send an error message to the client either;
>> but ereport(FATAL) will try exactly that.

> I thought since FATAL will force the backend to exit, we don't care much
> about corrupted OpenSSL state. I even thought that's why we raise ERROR to
> FATAL so that the backend can start in a clean state. But clearly I'm
> missing a point here because you don't think that way.

If we were to simply exit(1), leaving the kernel to close the client
socket, it'd be safe enough because control would never have returned to
OpenSSL. But this code doesn't do that. What we're looking at is that
we've interrupted OpenSSL at some arbitrary point, and now we're going
to make fresh calls to it to try to pump the FATAL error message out to
the client. It seems fairly unlikely that that's safe. I'm not sure
I credit Andres' worry of arbitrary code execution, but I do fear that
OpenSSL could get confused to the point of freezing up, or even more
likely that it would transmit garbage to the client, which rather
defeats the purpose.

Don't see a nice fix. The COMMERROR approach (ie, don't try to send
anything to the client, only the log) is not nice at all since the
client would get the impression that the server crashed. On the other
hand, anything else requires waiting till we get control back from
OpenSSL, which might be a long time, and meanwhile we're still holding
locks that prevent WAL recovery from proceeding.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-01-17 15:23:44 Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave
Previous Message Heikki Linnakangas 2013-01-17 15:18:14 Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave