Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)
Date: 2011-12-03 16:37:14
Message-ID: 4EDA503A.9010209@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19.10.2011 19:41, Greg Jaskiewicz wrote:
> On 19 Oct 2011, at 18:28, Florian Pflug wrote:
>> All the other flags which indicate cancellation reasons are set from signal handers, I believe. We could of course mark as ClientConnectionLostPending as volatile just to be consistent. Not sure whether that's a good idea, or not. It might prevent a mistake should we ever add code to detect lost connections asynchronously (i.e., from somewhere else than pq_flush). And the cost is probably negligible, because CHECK_FOR_INTERRUPTS tests for InterruptPending before calling ProcessInterrupts(), so we only pay the cost of volatile if there's actually an interrupt pending. But I still think it's better to add qualifies such a volatile only when really necessary. A comment about why it *isn't* volatile is probably in order, though, so I'll add that in the next version of the patch.
>>
> Makes sense.
>
> I had to ask, because it sticks out. And indeed there is a possibility that someone will one day will try to use from signal handler, etc.

Let's just mark it as volatile. It's not clear to me that we'll never
set or read the flag while in a signal handler. We probably don't, but
what if ImmediateInterruptOK is 'true', for example, just when we fail
to send a response and try to set the flags. In that case, we have to be
careful that ClientConnectionLost is set before InterruptPending (which
we can only guarantee if both are volatile, I believe), otherwise a
signal might arrive when we've just set InterruptPending, but not yet
ClientConnectionLost. ProcessInterrupts() would clear InterruptPending,
but not see ClientConnectionLost, so we would lose the interrupt.

That's not serious, and the window for that happening would be extremely
small, and I don't think it can actually happen as the code stands,
because we never try to send anything while ImmediateInterruptOK ==
true. But better safe than sorry.

One small change I'd like to make is to treat the loss of connection
more as a new "top-level" event, rather than as a new reason for query
cancellation. A lost connection is more like receiving SIGTERM, really.
Please take a look at the attached patch. I've been testing this by
doing "SELECT pg_sleep(1), a from generate_series(1,1000) a;", and
killing the connection with "killall -9 psql".

One remaining issue I have with this is the wording of the error
message. The closest existing message we have is in old-mode COPY:

ereport(FATAL,
(errcode(ERRCODE_CONNECTION_FAILURE),
errmsg("connection lost during COPY to stdout")));

In the patch, I made the new message just "connection lost", but does
anyone else have a better opinion on that? Perhaps it should be
"connection lost while sending response to client". Or "connection lost
during execution of statement", but that might not be accurate if we're
not executing a statement at the moment, but just sending a "sync"
message or similar.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
pg_discon_cancel.v1-heikki.patch text/x-diff 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2011-12-03 17:27:32 Re: psql line number reporting from stdin
Previous Message Heikki Linnakangas 2011-12-03 16:04:02 Re: backup_label during crash recovery: do we know how to solve it?