On 8/9/07, Peter Koczan <pjkoczan(at)gmail(dot)com> wrote:
> On 8/6/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > "Peter Koczan" <pjkoczan(at)gmail(dot)com> writes:
> > > Here's my theory (and feel free to tell me that I'm full of it)...somehow, a
> > > lot of notifies happened at once, or in a very short period of time, to the
> > > point where the app was still processing notifies when the timer clicked off
> > > another second. The connection (or app, or perl module) never marked those
> > > notifies as being processed, or never updated its timestamp of when it
> > > finished, so when the next notify came around, it tried to reprocess the old
> > > data (or data since the last time it finished), and yet again couldn't
> > > finish. Lather, rinse, repeat. In sum, it might be that trying to call
> > > pg_notifies while processing notifies tickles a race condition and tricks
> > > the connection into thinking its in a bad state.
> > Hmm. Is the app trying to do this processing inside an interrupt
> > service routine (a/k/a signal handler)? If so, and if the ISR can
> > interrupt itself, then you've got a problem because you'll be doing
> > reentrant calls of libpq, which it doesn't support. You can only make
> > that work if the handler blocks further occurrences of its signal until
> > it finishes.
> I'm not entirely sure if this answers your question, but here's what I
> found out from the primary maintainer of the app. Note that
> update_reqs is the function calling pg_notifies. If there's more
> information I can provide or another test we can run, please let me
> ------- BEGIN MESSAGE -------
> I just checked and the timer won't interrupt update_reqs, so we'll
> have to look for another solution. Anyway, update_reqs doesn't do
> anything with the database except for checking for a notify, so I
> don't see where it can be interrupted to cause DB problems.
> ------- END MESSAGE -------
> I also found out that one notify gets sent per action (not per batch
> of actions), so if n requests get resolved at once, n notifies are
> sent, not 1. In theory this could mitigate this problem, but I don't
> know how easy it is at this point. Still, it doesn't explain how or
> why the client's recv-q isn't getting cleared.
> Hope this helps.
On our end, we changed the the code in the function calling
pg_notifies to use an if statement rather than a while (that way it
only updates once per second instead of continuously as long as there
are pending async notifies).
I looked more closely at the docs for DBD::Pg, and the pg_notifies
call grabs *all* pending async notifies and returns them in a hash,
not just one at a time. So, what was happening before was that if a
new notify came through while processing the previous notifies, the
code would reprocess. Lather, rinse, repeat. I think that if the
program is waiting for pg_notifies when the timer interrupts it again,
causing the client to call pg_notifies while still waiting for
something. I think this is what gets the listening connection into the
In theory this change should mitigate the "notify interrupt" behavior
on our end, but, again, why the client's recv-q is filling up is as
P.S. In src/backend/commands/async.c, somewhere between lines 910 and
981 (set_ps_display calls) is where the code gets interrupted. How and
why, I don't know.
In response to
pgsql-bugs by date
|Next:||From: James William Pye||Date: 2007-08-10 18:26:55|
|Subject: BUG #3532: Can't rollup array of arrays|
|Previous:||From: Heikki Linnakangas||Date: 2007-08-10 14:11:15|
|Subject: Re: failed to re-find parent key in "..." for deletion target