|From:||"Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>|
|Subject:||[bug fix] postgres.exe crashes with access violation on Windows while starting up|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
We encountered a rare and hard-to-investigate problem on Windows, which one of our customers reported. Please find the attached patch to fix that. I'll add this to the next CF.
PostgreSQL sometimes crashes with the following messages. This is infrequent (but frequent for the customer); it occurred about 10 times in the past 5 months.
LOG: server process (PID 2712) was terminated by exception 0xC0000005
HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing
"server process" shows that an client backend crashed. The above messages indicate that the process was not running an SQL command.
PostgreSQL runs as a Windows service.
No crash dump was produced anywhere, despite the facts:
- <PGDATA>/crashdumps folder exists and is writable by the PostgreSQL user account (which is the user postgres.exe runs as)
- The Windows registry configuration allows dumping the crash dump
We believe WSAStartup() in main.c failed. The only conceivable error is:
Too many processes.
A Windows Sockets implementation may have a limit on the number of applications that can use it simultaneously. WSAStartup may fail with this error if the limit has been reached.
But I couldn't find what the limit is and whether we can tune it. We couldn't reproduce the problem.
When I pretend that WSAStartup() failed while a client backend is starting up, I could see the same phenomenon as the customer. This problem only occurs when PostgreSQL runs as a Windows service.
The bug is in write_eventlog(). It calls pgwin32_message_to_utf16() which in turn calls palloc(), which requires the memory management system to be set up (CurrentMemoryContext != NULL).
Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in write_console().
The reason is for not outputing the crash dump is a) the crash occurred before installing the Windows exception handler (pgwin32_install_crashdump_handler() call) and b) the effect of the following call in postmaster is inherited in the child process.
/* In case of general protection fault, don't show GUI popup box */
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
But I'm not sure in what order we should do pgwin32_install_crashdump_handler(), startup_hacks() and steps therein, MemoryContextInit(). I think that's another patch.
|Next Message||Tsunakawa, Takayuki||2017-10-27 02:51:00||Re: [bug fix] postgres.exe crashes with access violation on Windows while starting up|
|Previous Message||Amit Langote||2017-10-27 01:17:17||Re: path toward faster partition pruning|