Re: GNU/Hurd portability patches

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Banck <mbanck(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: GNU/Hurd portability patches
Date: 2025-10-10 02:59:12
Message-ID: CA+hUKGJE_moUF74c97GkoP6RaknRMoeFOednXe2FyXnS_bOTFQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[Using this as a general GNU/Hurd problem thread]

An interesting fruitcrow failure:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-30%2007%3A28%3A50

TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File:
"pqsignal.c", Line: 91, PID: 25731
postgres(ExceptionalCondition+0x5a) [0x1006b1d0a]
postgres(+0x711cf2) [0x100711cf2]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102bdffee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102bdffdd]
2025-09-30 08:38:59.451 BST [24668:6] LOG: client backend (PID 25731)
was terminated by signal 6: Aborted

Our definition of NSIG is:

#ifdef PG_SIGNAL_COUNT /* Windows */
#define PG_NSIG (PG_SIGNAL_COUNT)
#elif defined(NSIG)
#define PG_NSIG (NSIG)
#else
#define PG_NSIG (64) /* XXX: wild guess */
#endif

Is NSIG defined? Where on the internet can we see the SIGXXX signal
numbers and the glibc source that is actually used on these systems?
This has to be handling something installed by pqsignal(), so I guess
it's probably not the synchronous SIGABRT from abort() expected in
ExceptionCondition() (assuming that abort() is implemented as
raise(SIGABRT) in the traditional way, which might not be true), so
then I guess it must be an asynchronous signal, but which one?

Searching for that error in our archives brought up another platform
that saw the same assertion fail[1]. There it smelled a bit like an
uninitialised value somehow finishing up in there, maybe related to
valgrind, but I have no idea whether or how that relates to this
failure.

The main thing I learned while failing to find the values for those
symbols for myself was that it implements asynchronous signals in an
unorthodox way akin to Windows' SIGINT mechanism:

"The UNIX signalling mechanism is implemented for the GNU Hurd by
means of a separate signal thread that is part of every user-space
process. This makes handling of signals a separate thread of control.
GNU Mach itself has no idea what a signal is and kill is not a system
call (as it typically is in a UNIX system): it's implemented in
glibc." - glibc docs[2]

I haven't investigated the details or implications, but huh, I wonder
what that can break in our code... We're working on booting
asynchronous signals out of the code for various reasons so this might
already or at least soon be a non-issue, but still.

I've so far resisted the urge to spin up a Debian GNU/Hurd box to
figure any of that out for myself, but maybe someone has a clue...

[1] https://www.postgresql.org/message-id/flat/Z8z6EaT89FL7UUBU%40nathan#ed792121e7d146c44c2941f50a1d3142
[2] https://www.gnu.org/software/hurd/glibc/signal.html

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-10-10 03:29:01 Re: speedup COPY TO for partitioned table.
Previous Message jian he 2025-10-10 02:54:47 Re: speedup COPY TO for partitioned table.