Windows crash / abort handling

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Subject: Windows crash / abort handling
Date: 2021-10-05 19:30:33
Message-ID: 20211005193033.tg4pqswgvu3hcolm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

As threatened in [1]... For CI, originally in the AIO project but now more
generally, I wanted to get windows backtraces as part of CI. I also was
confused why visual studio's "just in time debugging" (i.e. a window popping
up offering to debug a process when it crashes) didn't work with postgres.

My first attempt was to try to use the existing crashdump stuff in
pgwin32_install_crashdump_handler(). That's not really quite what I want,
because it only handles postmaster rather than any binary, but I thought it'd
be a good start. But outside of toy situations it didn't work for me.

A bunch of debugging later I figured out that the reason neither the
SetUnhandledExceptionFilter() nor JIT debugging works is that the
SEM_NOGPFAULTERRORBOX in the
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
we do in startup_hacks() prevents the paths dealing with crashes from being
reached.

The SEM_NOGPFAULTERRORBOX hails from:

commit 27bff7502f04ee01237ed3f5a997748ae43d3a81
Author: Bruce Momjian <bruce(at)momjian(dot)us>
Date: 2006-06-12 16:17:20 +0000

Prevent Win32 from displaying a popup box on backend crash. Instead let
the postmaster deal with it.

Magnus Hagander

I actually see error popups despite SEM_NOGPFAULTERRORBOX, at least for paths
reaching abort() (and thus our assertions).

The reason for abort() error boxes not being suppressed appears to be that in
debug mode a separate facility is reponsible for that: [2], [3]

"The default behavior is to print the message. _CALL_REPORTFAULT, if set,
specifies that a Watson crash dump is generated and reported when abort is
called. By default, crash dump reporting is enabled in non-DEBUG builds."

We apparently need _set_abort_behavior(_CALL_REPORTFAULT) to have abort()
behave the same between debug and release builds. [4]

To prevent the error popups we appear to at least need to call
_CrtSetReportMode(). The docs say:

If you do not call _CrtSetReportMode to define the output destination of
messages, then the following defaults are in effect:

Assertion failures and errors are directed to a debug message window.

We can configure it so that that stuff goes to stderr, by calling
_CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
(and the same for _CRT_ERROR and perhaps _CRT_WARNING)
which removes the default _CRTDBG_MODE_WNDW.

It's possible that we'd need to do more than this, but this was sufficient to
get crash reports for segfaults and abort() in both assert and release builds,
without seeing an error popup.

To actually get the crash reports I ended up doing the following on the OS
level [5]:

Set-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug' -Name 'Debugger' -Value '\"C:\Windows Kits\10\Debuggers\x64\cdb.exe\" -p %ld -e %ld -g -kqm -c \".lines -e; .symfix+ ;.logappend c:\cirrus\crashlog.txt ; !peb; ~*kP ; .logclose ; q \"' ; `
New-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug' -Name 'Auto' -Value 1 -PropertyType DWord ; `
Get-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug' -Name Debugger; `

This requires 'cdb' to be present, which is included in the Windows 10 SDK (or
other OS versions, it doesn't appear to have changed much). Whenever there's
an unhandled crash, cdb.exe is invoked with the parameters above, which
appends the crash report to crashlog.txt.

Alternatively we can generate "minidumps" [6], but that doesn't appear to be more
helpful for CI purposes at least - all we'd do is to create a backtrace using
the same tool. But it might be helpful for local development, to e.g. analyze
crashes in more detail.

The above ends up dumping all crashes into a single file, but that can
probably be improved. But cdb is so gnarly that I wanted to stop looking once
I got this far...

Andrew, I wonder if something like this could make sense for windows BF animals?

Greetings,

Andres Freund

[1] https://postgr.es/m/20211001222752.wrz7erzh4cajvgp6%40alap3.anarazel.de
[2] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/crtsetreportmode?view=msvc-160
[3] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/set-abort-behavior?view=msvc-160
[4] If anybody can explain to me what the two different parameters to
_set_abort_behavior() do, I'd be all ears
[5] https://docs.microsoft.com/en-us/windows/win32/debug/configuring-automatic-debugging
[6] https://docs.microsoft.com/en-us/windows/win32/wer/wer-settings

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-10-05 19:41:37 Re: Role Self-Administration
Previous Message Bossart, Nathan 2021-10-05 19:19:18 Re: .ready and .done files considered harmful