Re: GNU/Hurd portability patches

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Michael Banck <mbanck(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: GNU/Hurd portability patches
Date: 2025-11-10 20:03:32
Message-ID: CA+hUKGLvrBt9bkjiHb8VOTGOsKfL6W0ik1+h0M1E5TSy1=QmJg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 11, 2025 at 8:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> With this modification:
> @@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func)
>
> #if !(defined(WIN32) && defined(FRONTEND))
> act.sa_handler = func;
> - sigemptyset(&act.sa_mask);
> + sigfillset(&act.sa_mask);
> act.sa_flags = SA_RESTART;
>
> I got 100 iterations passed (12 of them hanged) without that Assert
> triggered.

Interesting. Perhaps a minimal program that installs a handler
assert(signo < 32) for both SIGUSR1 and SIGUSR2 might fail too, if
another program loops calling kill(the_other_one, rand() % 2 == 0 ?
SIGUSR1 : SIGUSR2), to support a bug report?

> [lots of weird errors in a wide range of code]

I can't make much sense of these failures, but are you saying that
these only happen without that sigfillset(&act.sa_mask) change, that
is, when the signal implementation is misbehaving? If so, I wonder if
the same bug in their signal handling might just be corrupting the
user stack sometimes even when the signal number assertion doesn't
trip.

> On the assumption that this isn't a general bug, but just a timing issue
> (planning 'SELECT 1' isn't complicated), I see two possibilities:
>
> 1. Ignore the plan times, and replace SELECT 1 with SELECT
> pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test
> coverage so likely not be great?
>
> 2. Make the query a bit more complicated so that the plan time is likely
> to be non-negligable. I actually had to go quite a way to make it pretty
> failsafe, the attached made it fail less than 5 times out of 50000
> iterations, not sure whether that is acceptable or still considered
> flaky?

Wait, we have tests that fail if the clock doesn't advance? Isn't
that just bogus?

> What concerns me is that there is also subscription.sql and maybe could
> be other test(s) that expect at least 1000ns (far from infinite) timer
> resolution. Probably it would make sense to define which timer resolution
> we consider acceptable for tests and then to check if Hurd can provide it.

Ah, I see, so that one is checking if the last reset time advanced to
check that something happened. That also has the theoretical problem
that CLOCK_REALTIME can go backwards sometimes, due to ntpd
adjustments or whatever. In the absence of a "reset_counter" column,
perhaps we could consider a kludge like x->reset_time =
Max(x->reset_time + 1ns, now), just to make sure the value always goes
up on reset, without having any noticeable effect on normal systems...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-11-10 20:25:33 Re: Trying out <stdatomic.h>
Previous Message Nathan Bossart 2025-11-10 20:03:27 obsolete autovacuum comment