[bug fix] postgres.exe crashes with access violation on Windows while starting up

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: [bug fix] postgres.exe crashes with access violation on Windows while starting up
Date: 2017-10-27 02:10:21
Message-ID: 0A3221C70F24FB45833433255569204D1F80CC73@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

We encountered a rare and hard-to-investigate problem on Windows, which one of our customers reported. Please find the attached patch to fix that. I'll add this to the next CF.

PROBLEM
==============================

PostgreSQL sometimes crashes with the following messages. This is infrequent (but frequent for the customer); it occurred about 10 times in the past 5 months.

LOG: server process (PID 2712) was terminated by exception 0xC0000005
HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing

"server process" shows that an client backend crashed. The above messages indicate that the process was not running an SQL command.

PostgreSQL runs as a Windows service.

No crash dump was produced anywhere, despite the facts:
- <PGDATA>/crashdumps folder exists and is writable by the PostgreSQL user account (which is the user postgres.exe runs as)
- The Windows registry configuration allows dumping the crash dump

CAUSE
==============================

We believe WSAStartup() in main.c failed. The only conceivable error is:

WSAEPROCLIM
10067
Too many processes.
A Windows Sockets implementation may have a limit on the number of applications that can use it simultaneously. WSAStartup may fail with this error if the limit has been reached.

But I couldn't find what the limit is and whether we can tune it. We couldn't reproduce the problem.

When I pretend that WSAStartup() failed while a client backend is starting up, I could see the same phenomenon as the customer. This problem only occurs when PostgreSQL runs as a Windows service.

The bug is in write_eventlog(). It calls pgwin32_message_to_utf16() which in turn calls palloc(), which requires the memory management system to be set up (CurrentMemoryContext != NULL).

FIX
==============================

Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in write_console().

NOTE
==============================

The reason is for not outputing the crash dump is a) the crash occurred before installing the Windows exception handler (pgwin32_install_crashdump_handler() call) and b) the effect of the following call in postmaster is inherited in the child process.

/* In case of general protection fault, don't show GUI popup box */
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);

But I'm not sure in what order we should do pgwin32_install_crashdump_handler(), startup_hacks() and steps therein, MemoryContextInit(). I think that's another patch.

Regards
Takayuki Tsunakawa

Attachment Content-Type Size
write_eventlog_crash.patch application/octet-stream 887 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2017-10-27 02:51:00 Re: [bug fix] postgres.exe crashes with access violation on Windows while starting up
Previous Message Amit Langote 2017-10-27 01:17:17 Re: path toward faster partition pruning