Re: GNU/Hurd portability patches

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Banck <mbanck(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: GNU/Hurd portability patches
Date: 2025-11-10 21:00:01
Message-ID: 2f4f4487-d4e1-461c-b34b-a22ed686eea2@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

10.11.2025 22:03, Thomas Munro wrote:
> On Tue, Nov 11, 2025 at 8:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>> With this modification:
>> @@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func)
>>
>> #if !(defined(WIN32) && defined(FRONTEND))
>> act.sa_handler = func;
>> - sigemptyset(&act.sa_mask);
>> + sigfillset(&act.sa_mask);
>> act.sa_flags = SA_RESTART;
>>
>> I got 100 iterations passed (12 of them hanged) without that Assert
>> triggered.
> Interesting. Perhaps a minimal program that installs a handler
> assert(signo < 32) for both SIGUSR1 and SIGUSR2 might fail too, if
> another program loops calling kill(the_other_one, rand() % 2 == 0 ?
> SIGUSR1 : SIGUSR2), to support a bug report?

Yeah, thank you for the idea! I will try it in the coming days.

>> [lots of weird errors in a wide range of code]
> I can't make much sense of these failures, but are you saying that
> these only happen without that sigfillset(&act.sa_mask) change, that
> is, when the signal implementation is misbehaving? If so, I wonder if
> the same bug in their signal handling might just be corrupting the
> user stack sometimes even when the signal number assertion doesn't
> trip.

No, I think those failures are unrelated, I hit them just because I
executed `make check` many times and some of them definitely occurred
with the unmodified code. Now that I have a script that handles OS hangs
and restores VM's disk automatically, I can run tests for hours and look
for one failure or another if it can be helpful.

>> On the assumption that this isn't a general bug, but just a timing issue
>> (planning 'SELECT 1' isn't complicated), I see two possibilities:
>>
>> 1. Ignore the plan times, and replace SELECT 1 with SELECT
>> pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test
>> coverage so likely not be great?
>>
>> 2. Make the query a bit more complicated so that the plan time is likely
>> to be non-negligable. I actually had to go quite a way to make it pretty
>> failsafe, the attached made it fail less than 5 times out of 50000
>> iterations, not sure whether that is acceptable or still considered
>> flaky?
> Wait, we have tests that fail if the clock doesn't advance? Isn't
> that just bogus?

Yeah, we have, this was discussed (and one test was hardened) upthread.

>> What concerns me is that there is also subscription.sql and maybe could
>> be other test(s) that expect at least 1000ns (far from infinite) timer
>> resolution. Probably it would make sense to define which timer resolution
>> we consider acceptable for tests and then to check if Hurd can provide it.
> Ah, I see, so that one is checking if the last reset time advanced to
> check that something happened. That also has the theoretical problem
> that CLOCK_REALTIME can go backwards sometimes, due to ntpd
> adjustments or whatever. In the absence of a "reset_counter" column,
> perhaps we could consider a kludge like x->reset_time =
> Max(x->reset_time + 1ns, now), just to make sure the value always goes
> up on reset, without having any noticeable effect on normal systems...

AFAICS, those test cases use pg_clock_gettime_ns() with CLOCK_MONOTONIC
(if defined, and it's really defined on Hurd), so it should not matter in
this concrete case.

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2025-11-10 21:11:50 Re: RFC: adding pytest as a supported test framework
Previous Message Jeff Davis 2025-11-10 20:39:28 Re: Remaining dependency on setlocale()