Re: pg_listener entries deleted under heavy NOTIFY load only on Windows

From: "Marshall, Steve" <smarshall(at)wsi(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Date: 2009-01-16 13:27:07
Message-ID: 8536F69C1FCC294B859D07B179F069441176B988@EXCHANGE.ad.wsicorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

"Marshall, Steve" <smarshall(at)wsi(dot)com> writes:
> Under a heavy load of NOTIFY events, entries in the pg_listener table
> for some events are deleted, effectively acting as though UNLISTEN
> were called.

> I have only been able to make this occur on a PostgreSQL server
> running on Windows.
"tom lane writes":
AFAICS the most likely explanation for this is that Send_Notify() gets
an error from kill() and concludes that the listening process has died
without removing its pg_listener entry; whereupon it removes it itself.
Looking at pgkill(), that theory implies that CallNamedPipe() failed
when under sufficient stress. I'm not sure what the "timeout" parameter
we use with CallNamedPipe actually limits, but maybe it's too small?
(Microsoft's doc suggests that the timeout only matters if the pipe
doesn't already exist, so I'm not sure I believe this theory; though
certainly the doc is vague enough that that reading could be wrong.)

Theory B is that you've got some broken antivirus code on there that is
arbitrarily interfering with the pipe access. The lack of any similar
previous reports suggests that there's some local issue contributing ,,,

To explore Theory B, I'll turn off all non-essential services on the
Windows server and rerun the test. I'll report back with what I find.

Any thoughts on how to confirm or deny Theory A?

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Marshall, Steve 2009-01-16 13:39:24 Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Previous Message Michael Fuhr 2009-01-16 13:20:51 Re: BUG #4618: nolock changes first column name of query result set to 'nolock'