Skip site navigation (1) Skip section navigation (2)

Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)
Date: 2011-12-03 16:37:14
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On 19.10.2011 19:41, Greg Jaskiewicz wrote:
> On 19 Oct 2011, at 18:28, Florian Pflug wrote:
>> All the other flags which indicate cancellation reasons are set from signal handers, I believe. We could of course mark as ClientConnectionLostPending as volatile just to be consistent. Not sure whether that's a good idea, or not. It might prevent a mistake should we ever add code to detect lost connections asynchronously (i.e., from somewhere else than pq_flush). And the cost is probably negligible, because CHECK_FOR_INTERRUPTS tests for InterruptPending before calling ProcessInterrupts(), so we only pay the cost of volatile if there's actually an interrupt pending. But I still think it's better to add qualifies such a volatile only when really necessary. A comment about why it *isn't* volatile is probably in order, though, so I'll add that in the next version of the patch.
> Makes sense.
> I had to ask, because it sticks out. And indeed there is a possibility that someone will one day will try to use from signal handler, etc.

Let's just mark it as volatile. It's not clear to me that we'll never 
set or read the flag while in a signal handler. We probably don't, but 
what if ImmediateInterruptOK is 'true', for example, just when we fail 
to send a response and try to set the flags. In that case, we have to be 
careful that ClientConnectionLost is set before InterruptPending (which 
we can only guarantee if both are volatile, I believe), otherwise a 
signal might arrive when we've just set InterruptPending, but not yet 
ClientConnectionLost. ProcessInterrupts() would clear InterruptPending, 
but not see ClientConnectionLost, so we would lose the interrupt.

That's not serious, and the window for that happening would be extremely 
small, and I don't think it can actually happen as the code stands, 
because we never try to send anything while ImmediateInterruptOK == 
true. But better safe than sorry.

One small change I'd like to make is to treat the loss of connection 
more as a new "top-level" event, rather than as a new reason for query 
cancellation. A lost connection is more like receiving SIGTERM, really. 
Please take a look at the attached patch. I've been testing this by 
doing "SELECT pg_sleep(1), a from generate_series(1,1000) a;", and 
killing the connection with "killall -9 psql".

One remaining issue I have with this is the wording of the error 
message. The closest existing message we have is in old-mode COPY:

	 errmsg("connection lost during COPY to stdout")));

In the patch, I made the new message just "connection lost", but does 
anyone else have a better opinion on that? Perhaps it should be 
"connection lost while sending response to client". Or "connection lost 
during execution of statement", but that might not be accurate if we're 
not executing a statement at the moment, but just sending a "sync" 
message or similar.

   Heikki Linnakangas

Attachment: pg_discon_cancel.v1-heikki.patch
Description: text/x-diff (3.2 KB)

In response to


pgsql-hackers by date

Next:From: Peter EisentrautDate: 2011-12-03 17:27:32
Subject: Re: psql line number reporting from stdin
Previous:From: Heikki LinnakangasDate: 2011-12-03 16:04:02
Subject: Re: backup_label during crash recovery: do we know how to solve it?

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group