Re: Add errdetail() with PID and UID about source of termination signal

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add errdetail() with PID and UID about source of termination signal
Date: 2026-04-15 06:11:46
Message-ID: 6F78A2BD-4C0A-4620-AD98-18DF9D9FB4CD@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Apr 15, 2026, at 04:38, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 2026-04-14 Tu 6:40 AM, Jakub Wartak wrote:
>> Hi Andres, Andrew, Chao
>>
>> thanks for putting so much effort into enhancing the the previous implementation
>> of this code. Earlier I was not aware of potential problems involved.
>>
>> On Fri, Apr 10, 2026 at 9:41 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>
>>>
>>>> On Apr 9, 2026, at 18:59, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>>>
>>>>
>>>> On 2026-04-08 We 1:01 PM, Andres Freund wrote:
>>>>> Hi,
>>>>>
>>>>> Attached is a very rough first draft for how I think this needs to look like.
>>>>>
>>>>> Basically, SIGNAL_INFO always will pass both the signal number and extended
>>>>> information along to the signal handler. The extended information is a
>>>>> postgres specific struct. If the platform can't provide the extended
>>>>> information, the values are instead set to some default value indicating that
>>>>> the information is not known.
>>>>>
>>>>> With that die() (and also StatementCancelHandler, ...) can just set whatever
>>>>> globals it wants, without pqsignal.c needing to know about it.
>>>>>
>>>>> It also allows us to extend the amount of information in the future. E.g. I'd
>>>>> like to log the reason for a segfault (could e.g. be an OOM kill or an umapped
>>>>> region) to stderr.
>>>>>
>>>>> The annoying thing about it is needing to change nearly all the existing
>>>>> references to SIG_IGN/SIG_DFL, to avoid warnings due to mismatched types.
>>>> I agree that's annoying. The only way around it I found was via some casting to/from void* that I suspect you would find a cure worse than the disease.
>>>> I reworked your patch slightly. This version fixes the translatability issue you raised earlier, makes the TAP test from the original commit more robust, and tries to resolve your XXX issue by moving the assignment of ProcDieSenderPid/Uid inside the "if (!proc_exit_inprogress)" block.
>>>>
>>> I reviewed this version. Besides the compile warning and uid 0 issues, I got a few more comments, so I try to put them all together as below.
>>>
>> TL;DR; win/mingw is really unhappy with the state of rework patch right now.
>> I've tried to enhance and fix it, and now it's green for default CI run and
>> also for mingw too. Attached 0002-fixup-win32 patch does not incorporate Chao's
>> all findings so far.
>>
>> [..]
>>> 2 - uid 0 problem
>>> ```
>>> +typedef struct pg_signal_info
>>> +{
>>> + pid_t pid; /* pid of sending process or 0 if unknown */
>>> + uid_t uid; /* uid of sending process or 0 if unknown */
>>> +} pg_signal_info;
>>> ```
>>>
>>> I think we can mention that “uid” is only meaningful when pid is set.
>> [..]
>>> 5
>>> ```
>>> + pg_info.uid = 0;
>>> ```
>>>
>>> If you take comment 2, then when unavailable, we can just don’t assign anything to pg_info.uid. Or "(uid_t)-1", maybe.
>>>
>>
>> I. I've started from this and I vaguley remembered that I had terrible
>> experience
>> when trying to chose the proper types for those field types, but I couldn't
>> remind myself why, so I've gave a try of the current rework patch and got this
>> on Windows Server 2022, vs2019, cirrus-ci said:
>>
>> [08:40:22.848] c:\cirrus\src\include\c.h(1451): error C2061: syntax
>> error: identifier 'pid_t'
>> [08:40:22.848] c:\cirrus\src\include\c.h(1452): error C2061: syntax
>> error: identifier 'uid'
>> [08:40:22.848] c:\cirrus\src\include\c.h(1452): error C2059: syntax error: ';'
>> [08:40:22.848] c:\cirrus\src\include\c.h(1453): error C2059: syntax error: '}'
>>
>> for c.h:
>> 1449 typedef struct pg_signal_info
>> 1450 {
>> 1451 pid_t pid; /* pid of
>> sending process or 0 if unknown */
>> 1452 uid_t uid; /* uid of
>> sending process or 0 if unknown */
>> 1453 } pg_signal_info;
>>
>> so maybe we should just move that typedef with SIGNAL_ARGS to after
>> "include of port.h" (line ~1471) in that c.h, because only then we'll have
>> access to:
>> src/include/port/win32_port.h:typedef int pid_t;
>> src/include/port/win32_port.h:typedef int uid_t;
>> but the problem is that port.h itself requires SIGNAL_ARGS to be defined
>> and that seems to be like chicken and egg problem. I thought that just
>> using native "ints" could be the way to go, but the problem is that now
>> that uid_t can be bigger than pid_t as Linux kernel headers show this:
>>
>> x86_64-linux-gnu/bits/types.h:#define __S32_TYPE int
>> x86_64-linux-gnu/bits/types.h:#define __U32_TYPE unsigned int
>> x86_64-linux-gnu/bits/types.h:__STD_TYPE __UID_T_TYPE __uid_t; /*
>> Type of user identifications. */
>> x86_64-linux-gnu/bits/types.h:__STD_TYPE __PID_T_TYPE __pid_t; /*
>> Type of process identifications. */
>> x86_64-linux-gnu/bits/typesizes.h:#define __UID_T_TYPE __U32_TYPE
>> x86_64-linux-gnu/bits/typesizes.h:#define __PID_T_TYPE __S32_TYPE
>>
>> so maybe do that typedef struct pg_signal_info with 2x uint32_t and that's
>> good enough? (it covers the ranges necessary and makes it platform compatible)
>>
>> II. While we are tthis so with above uid_t of up to being u32, possibly
>> `volatile int ProcDieSenderUid/Pid` should be also bigger like uint32_t?
>> My fixup-patch does not incude it, because I don't know really.
>>
>> III. As Chao said I've used:
>>
>> +#define PG_SIG_DFL (pqsigfunc) (pg_funcptr_t) SIG_DFL
>> +#define PG_SIG_IGN (pqsigfunc) (pg_funcptr_t) SIG_IGN
>>
>> IV. Then just got lots of warnings for use_wrapper/wrapp_handler and so on
>>
>> [09:27:33.602] ../src/port/pqsignal.c(206): error C2065:
>> 'use_wrapper': undeclared identifier
>> [09:27:33.602] ../src/port/pqsignal.c(206): warning C4113: 'pqsigfunc'
>> differs in parameter lists from 'void (__cdecl *)(int)'
>>
>> that's for:
>> 204 #else
>> 205 /* Forward to Windows native signal system. */
>> 206 if (signal(signo, use_wrapper ? wrapper_handler :
>> func) == SIG_ERR)
>> 207 Assert(false); /* probably
>> indicates coding error */
>>
>> In the end, I've ended up using wrapper_handler for the windows path there
>> as signal() requires function with single param.
>>
>> IV. Got some further issues, and VC complained that siginfo_t is used for
>> USE_SIGACTION - isn't it impossible on win32? Anyway, that makes some sense,
>> USE_SIGINFO is not defined and we use 'siginfo_t', so something like fixes it:
>>
>> @@ -90,7 +90,7 @@ static volatile pqsigfunc pqsignal_handlers[PG_NSIG];
>> *
>> * This wrapper also handles restoring the value of errno.
>> */
>> -#ifdef USE_SIGACTION
>> +#if defined(USE_SIGACTION) && defined(USE_SIGINFO)
>> static void
>> wrapper_handler(int postgres_signal_arg, siginfo_t *info, void *context)
>> #else
>>
>> V. Later I've stumbled on series of other problems related to
>> src/backend/port/win32/signal.c (it also uses SIG_DFL but not PG_SIG_DFL and
>> stil somewhat references pgsigfunc, so those changes seemed to impact it).
>>
>> Also there was:
>> [10:37:02.824] ../src/backend/port/win32/signal.c(154): error C2198:
>> 'sig': too few arguments for call
>>
>> so I've fixed with adding nodata struct there and:
>> @@ -151,7 +154,7 @@ pgwin32_dispatch_queued_signals(void)
>> block_mask |= sigmask(i);
>>
>> sigprocmask(SIG_BLOCK,
>> &block_mask, &save_mask);
>> - sig(i);
>> + sig(i, &nodata);
>> sigprocmask(SIG_SETMASK,
>> &save_mask, NULL);
>>
>> VI. Possibly we could rename USE_SIGACTION define because at least to me it
>> is confusing to me to reason and communicate about in terms of win32 context
>> on win32 do we do have it or not? (it can be both ways):
>> * win32 C API doesnt have it
>> * PG does have sigaction win32 wrapper with override macro
>> #define sigaction.. pqsigaction..
>> * however src/include/libpq/pqsignal.h says "sa_sigaction not yet implemented"
>> for it (so with USE_SIGACTION are we talking about sa_sigaction field memeber
>> that it's not used or about sigaction function?)
>>
>> VII. FWIW, I've also removed superflous "else"
>> #ifdef USE_SIGINFO
>> if (!(is_ign || is_dfl))
>> {
>> act.sa_sigaction = wrapper_handler;
>> act.sa_flags |= SA_SIGINFO;
>> }
>> - else
>> #else
>>
>>
>
>
> I'm not 100% sure that else shouldn't be there. Maybe have another look?
>
> Attached is a consolidation that includes your other fixes, plus:
>
> . uid comment -- "only meaningful when pid is not 0"
> . const pg_signal_info *pg_siginfo in SIGNAL_ARGS
> . 2 spaces after period in syncrep.c errdetail
>
>
> cheers
>
>
> andrew
>
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
> <v2-0001-Rework-signal-handler-infrastructure-to-pass-send.patch>

V2 LGTM.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2026-04-15 06:30:16 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Previous Message Amit Langote 2026-04-15 06:03:50 Re: Eliminating SPI / SQL from some RI triggers - take 3