Re: [INTERFACES] Coping with backend crash in libpq

From: Karl Denninger <karl(at)mcs(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: [INTERFACES] Coping with backend crash in libpq
Date: 1998-07-28 17:44:59
Message-ID: 19980728124459.39631@mcs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-interfaces

On Tue, Jul 28, 1998 at 01:23:35PM -0400, Tom Lane wrote:
> I've just noticed that libpq doesn't cope very gracefully if the backend
> exits when not in the middle of a query (ie, because the postmaster told
> it to quit after some other BE crashed). The behavior in psql, for
> example, is that the next time you issue a query, psql just exits
> without printing anything at all. This is Not Friendly, especially
> considering that the BE sent a nice little notice message before it quit.
>
> The main problem is that if the next thing you do is to send a new query,
> send() sees that the connection has been closed and generates a SIGPIPE
> signal. By default that terminates the frontend process.
>
> We could cure this by having libpq disable SIGPIPE, but we would have
> to disable it before each send() and re-enable afterwards to avoid
> affecting the behavior of the rest of the frontend application.
> Two additional kernel calls per query sounds like a lot of overhead.
> (We do actually do this when trying to close the connection, but not
> during normal queries.)
>
> Perhaps a better answer is to have PQsendQuery check for fresh input
> from the backend before trying to send the query. This would have two
> side effects:
> 1. If a NOTICE message has arrived, we could print it.
> 2. If EOF is detected, we will reset the connection state to
> CONNECTION_BAD, which PQsendQuery can use to avoid trying to send.
>
> The minimum cost to do this is one kernel call (a select(), which
> unfortunately is probably a fairly expensive call) in the normal
> case where no new input has arrived. Another objection is that it's
> not 100% bulletproof --- if the backend closes the connection in the
> window between select() and send() then you can still get SIGPIPE'd.
> The odds of this seem pretty small however.
>
> I'm inclined to go with answer #2, because it seems to have less
> of a performance impact, and it will ensure that the backend's polite
> "The Postmaster has informed me that some other backend died abnormally
> and possibly corrupted shared memory." message gets displayed. With
> approach #1 we'd still have to go through some pushups to get the
> notice to come out.
>
> Does anyone have an objection, or a better idea?
>
> regards, tom lane
>

Not really.

I've noticed this kind of problem where the backend will fault in some way,
and after it does so, the library gets "confused".

We have a couple of processes here that are NEVER supposed to exit. They
open a connection for each transaction, and close it at the end. If
something happens to the backend where it dies abnormally, these processes
will sometimes get into an odd state in the libpq library where all new
connection attempts fail immediately.

I've yet to find a foolproof coding way around this particular problem.

--
--
Karl Denninger (karl(at)MCS(dot)Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/ | T1's from $600 monthly / All Lines K56Flex/DOV
| NEW! Corporate ISDN Prices dropped by up to 50%!
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dr. Michael Meskes 1998-07-28 19:25:57 Re: [HACKERS] Q about read committed in Oracle...
Previous Message Tom Lane 1998-07-28 17:23:35 Coping with backend crash in libpq

Browse pgsql-interfaces by date

  From Date Subject
Next Message wojtek 1998-07-28 18:19:11 Error 'Named portals'
Previous Message Tom Lane 1998-07-28 17:23:35 Coping with backend crash in libpq