Re: pg_listener entries deleted under heavy NOTIFY load only on Windows

From: "Marshall, Steve" <smarshall(at)wsi(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Date: 2009-01-28 18:11:26
Message-ID: 8536F69C1FCC294B859D07B179F0694411A4D4C8@EXCHANGE.ad.wsicorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The reason preserving the subscription is good is because the current
behavior silently drops the subscription without informing the
subscribing process in any way. This means applications waiting for
changes to a database tables have the mistaken impression no changes
have occurred.

The reason I think updating the pg_listener tuple even if kill fails may
be based on a mistaken understanding of asynchronous notification. We
could easily avoid it, if it's not the right thing to do. My thought
was that a single byte was written to the named pipe to inform the
listener of the presence of new events, but that the knowledge of what
event(s) are pending is kept in the pg_listener table. Since this
problem only occurs under heavy notification load (i.e. when the pipe
already has data in it), the listener is going to get signalled about
the presence of new events even if one particular kill() call failed.
Let me know if I misunderstand the use of the named pipe and the
pg_listener table in event notification.

Given my understanding, it seems like a good idea to ensure the state of
pg_listener is updated with all events that were received. If we don't
do that, we risk the case that the one kill that fails if for an event
that was only issued that one time. In that case, we would fail to
alert the application that this event occurred.

I don't think a check for process existance is a bad idea, or even a
bandaid. The comment in the code block in async.c says it is removing
the entry in pg_listener because the backend process does not exist.
The code assumes the process is dead, but does not check it. I don't
think it is unreasonable to check this assumption, and behave
accordingly.

If you have suggestions on how to get more information on the cause of
the error, I'd be be willing to look into them. The microsoft
documentation only describes a couple of error conditions for
CallNamedPipe (error 31 is not one of them), and the error code itself
is not very descriptive ("A device attached to the system is not
functioning"). I'm open to ideas on how to dig further.

Steve

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Wednesday, January 28, 2009 12:44 PM
To: Marshall, Steve
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] pg_listener entries deleted under heavy NOTIFY load
only on Windows

"Marshall, Steve" <smarshall(at)wsi(dot)com> writes:
> Essentially, the new error handling in async.c allows postgres to fail

> in its efforts to signal a process about the presence of a NOTIFY
> event without invalidating the subscription (LISTEN) for that event.

But why would that be a good idea? It seems to me to be a bandaid that
guarantees misbehavior in the infrequent-notification case.

The real question here, which this doesn't advance us towards solving,
is why is the CallNamedPipe call sometimes failing (or at least taking
longer than it seems it ought to).

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-01-28 18:41:13 Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Previous Message Tom Lane 2009-01-28 17:44:20 Re: pg_listener entries deleted under heavy NOTIFY load only on Windows