Hang on NOTIFY

From: Mark Simonetti <marks(at)opalsoftware(dot)co(dot)uk>
To: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Hang on NOTIFY
Date: 2015-08-07 11:32:38
Message-ID: 55C49756.70505@opalsoftware.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The system I am developing makes extensive use of the async
NOTIFY/LISTEN system.

I am currently experiencing a problem on 2 production servers:

Server 1:
Virtual Windows Server 2008 R2 (VMWare)
PostgreSQL 9.3.5

Server 2:
Virtual Windows Server 2008 R2 (VMWare)
PostgreSQL 9.4.2

After the system has been running for a period of time, sometimes a few
days sometimes a few weeks, any calls to NOTIFY
will hang.

After in depth investigation it appears to happen when a listening
backend has been connected for some time (days).

Any other backend trying to inform that backend will hang on
"CallNamedPipe" in pgkill (kill.c).

Here is a stack trace from the hung SENDING backend, main thread : -

ntdll(dot)dll!_NtFsControlFile(at)40() + 0x15 bytes
ntdll(dot)dll!_NtFsControlFile(at)40() + 0x15 bytes
kernel32(dot)dll!_CallNamedPipeW(at)28() + 0xf4 bytes
postgres.exe!pgkill(int pid, int sig) Line 43 + 0x2b bytes C
postgres.exe!SendProcSignal(int pid, ProcSignalReason reason, int
backendId) Line 198 + 0x10 bytes C
postgres.exe!SignalBackends() Line 1497 + 0xe bytes C
> postgres.exe!ProcessCompletedNotifies() Line 1092 C
postgres.exe!PostgresMain(int argc, char * * argv, const char *
dbname, const char * username) Line 3947 C
postgres.exe!BackendRun(Port * port) Line 4011 + 0x21 bytes C
postgres.exe!SubPostmasterMain(int argc, char * * argv) Line 4515
+ 0x8 bytes C
postgres.exe!main(int argc, char * * argv) Line 203 + 0x7 bytes C
postgres.exe!__tmainCRTStartup() Line 555 + 0x17 bytes C
kernel32(dot)dll!(at)BaseThreadInitThunk@12() + 0x12 bytes
ntdll(dot)dll!___RtlUserThreadStart(at)8() + 0x27 bytes
ntdll(dot)dll!__RtlUserThreadStart(at)8() + 0x1b bytes

Here is a stack trace from the signalling thread (I know its irrelevent
as this is for incomming signals) : -

ntdll(dot)dll!_NtFsControlFile(at)40() + 0x15 bytes
ntdll(dot)dll!_NtFsControlFile(at)40() + 0x15 bytes
> postgres.exe!pg_signal_thread(void * param) Line 279 + 0x9 bytes C

Now for the RECIPIENT backend : -

ntdll(dot)dll!_ZwWaitForMultipleObjects(at)20() + 0x15 bytes
ntdll(dot)dll!_ZwWaitForMultipleObjects(at)20() + 0x15 bytes
KERNELBASE(dot)dll!_WaitForMultipleObjectsEx(at)20() + 0x36 bytes
kernel32(dot)dll!_WaitForMultipleObjectsExImplementation(at)20() + 0x8e
bytes
> postgres.exe!pgwin32_waitforsinglesocket(unsigned int s, int
what, int timeout) Line 216 + 0x14 bytes C
postgres.exe!pgwin32_recv(unsigned int s, char * buf, int len, int
f) Line 352 + 0xa bytes C
postgres.exe!secure_read(Port * port, void * ptr, unsigned int
len) Line 304 + 0x12 bytes C
postgres.exe!pq_getbyte() Line 895 + 0x67 bytes C
postgres.exe!SocketBackend(StringInfoData * inBuf) Line 344 + 0x5
bytes C
postgres.exe!PostgresMain(int argc, char * * argv, const char *
dbname, const char * username) Line 3968 + 0x1c bytes C
postgres.exe!BackendRun(Port * port) Line 4011 + 0x21 bytes C
postgres.exe!SubPostmasterMain(int argc, char * * argv) Line 4515
+ 0x8 bytes C
postgres.exe!main(int argc, char * * argv) Line 203 + 0x7 bytes C
postgres.exe!__tmainCRTStartup() Line 555 + 0x17 bytes C
kernel32(dot)dll!(at)BaseThreadInitThunk@12() + 0x12 bytes
ntdll(dot)dll!___RtlUserThreadStart(at)8() + 0x27 bytes
ntdll(dot)dll!__RtlUserThreadStart(at)8() + 0x1b bytes

This is the usual place for it to wait, so this seems okay.

ntdll(dot)dll!_NtFsControlFile(at)40() + 0x15 bytes
ntdll(dot)dll!_NtFsControlFile(at)40() + 0x15 bytes
> postgres.exe!pg_signal_thread(void * param) Line 279 + 0x9 bytes C

Also looks fine.

This seems like a possible Windows bug, as the call to CallNamedPipe has
a timeout of 1000 milliseconds, but it is clearly not timing out. It
only seems to exit if I exit the backend it is trying to signal.

NOTE: it is trying to send to many backends, but on all the stuck
backends I checked, they all were stuck sending to the same recipient.
Closing that particular recipient DOES free everything up and signals
start flowing again.

I've searched around and cannot find a similar bug report. Is it
possibly something I'm doing wrong?

Thanks,
Mark.
--

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message beijing_pg 2015-08-07 12:16:43 BUG #13541: There is a visibility issue when run some DDL and Query. The time window is very shot
Previous Message Bruce Momjian 2015-08-06 16:24:28 Re: BUG #13540: upsert is not good