Re: BUG #3504: Some listening sessions never return from writing, problems ensue

From: "Peter Koczan" <pjkoczan(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3504: Some listening sessions never return from writing, problems ensue
Date: 2007-08-09 18:55:00
Message-ID: 4544e0330708091155r2db59ea4w6b2e34cbbc8d3ae3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 8/6/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Peter Koczan" <pjkoczan(at)gmail(dot)com> writes:
> > Here's my theory (and feel free to tell me that I'm full of it)...somehow, a
> > lot of notifies happened at once, or in a very short period of time, to the
> > point where the app was still processing notifies when the timer clicked off
> > another second. The connection (or app, or perl module) never marked those
> > notifies as being processed, or never updated its timestamp of when it
> > finished, so when the next notify came around, it tried to reprocess the old
> > data (or data since the last time it finished), and yet again couldn't
> > finish. Lather, rinse, repeat. In sum, it might be that trying to call
> > pg_notifies while processing notifies tickles a race condition and tricks
> > the connection into thinking its in a bad state.
>
> Hmm. Is the app trying to do this processing inside an interrupt
> service routine (a/k/a signal handler)? If so, and if the ISR can
> interrupt itself, then you've got a problem because you'll be doing
> reentrant calls of libpq, which it doesn't support. You can only make
> that work if the handler blocks further occurrences of its signal until
> it finishes.
>

I'm not entirely sure if this answers your question, but here's what I
found out from the primary maintainer of the app. Note that
update_reqs is the function calling pg_notifies. If there's more
information I can provide or another test we can run, please let me
know.

------- BEGIN MESSAGE -------
I just checked and the timer won't interrupt update_reqs, so we'll
have to look for another solution. Anyway, update_reqs doesn't do
anything with the database except for checking for a notify, so I
don't see where it can be interrupted to cause DB problems.
------- END MESSAGE -------

I also found out that one notify gets sent per action (not per batch
of actions), so if n requests get resolved at once, n notifies are
sent, not 1. In theory this could mitigate this problem, but I don't
know how easy it is at this point. Still, it doesn't explain how or
why the client's recv-q isn't getting cleared.

Hope this helps.

Peter

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2007-08-09 23:12:42 Re: BUG #3525: Lithuanian characters doesn't work in queries with regular expressions
Previous Message Rolandas Rudomanskis 2007-08-09 16:53:15 BUG #3525: Lithuanian characters doesn't work in queries with regular expressions