Re: GNU/Hurd portability patches

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Michael Banck <mbanck(at)gmx(dot)net>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: GNU/Hurd portability patches
Date: 2025-10-12 13:00:00
Message-ID: b87a0112-6235-4d87-886b-d1c79c0e0543@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

12.10.2025 11:31, Michael Banck wrote:
>
> Any way to easily reproduce this? It happened only once on fruitcrow so
> far.

I'd say it happens pretty often when `make check` doesn't hang (so it
takes an hour or two for me to reproduce).

Though now that you've mentioned MAX_CONNECTIONS => '3', I also tried:
EXTRA_REGRESS_OPTS="--max-connections=3" make -s check
and it passed 6 iterations for me. Iteration 7 failed with:
not ok 213   + partition_aggregate                      1027 ms

--- /home/demo/postgresql/src/test/regress/expected/partition_aggregate.out 2025-10-11 10:04:36.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/partition_aggregate.out 2025-10-12 13:02:05.000000000 +0100
@@ -1476,14 +1476,14 @@
 (15 rows)

 SELECT x, sum(y), avg(y), sum(x+y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
- x  | sum  |        avg         |  sum  | count
-----+------+--------------------+-------+-------
-  0 | 5000 | 5.0000000000000000 |  5000 |  1000
-  1 | 6000 | 6.0000000000000000 |  7000 |  1000
- 10 | 5000 | 5.0000000000000000 | 15000 |  1000
- 11 | 6000 | 6.0000000000000000 | 17000 |  1000
- 20 | 5000 | 5.0000000000000000 | 25000 |  1000
- 21 | 6000 | 6.0000000000000000 | 27000 |  1000
+ x  | sum  |            avg             |  sum  | count
+----+------+----------------------------+-------+-------
+  0 | 5000 |         5.0000000000000000 |  5000 |  1000
+  1 | 6000 |         6.0000000000000000 |  7000 |  1000
+ 10 | 5000 | 0.000000052757140846001326 | 15000 |  1000
+ 11 | 6000 |         6.0000000000000000 | 17000 |  1000
+ 20 | 5000 |         5.0000000000000000 | 25000 |  1000
+ 21 | 6000 |         6.0000000000000000 | 27000 |  1000
 (6 rows)

Then another 6 iterations passed, seventh one hanged. Then 10 iterations
passed.

With  EXTRA_REGRESS_OPTS="--max-connections=10" make -s check, I got:
2025-10-12 13:52:58.559 BST client backend[15475] pg_regress/constraints STATEMENT:  ALTER TABLE notnull_tbl2 ALTER a
DROP NOT NULL;
!!!wrapper_handler[15479]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 28481392, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 15476
postgres(ExceptionalCondition+0x5a) [0x1006af78a]
postgres(+0x70f59a) [0x10070f59a]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102b89fee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102b89fdd]

on iteration 5.

So we can conclude that the issue with signals is better reproduced with
higher concurrency.

28481392 (0x1b29770) is pretty close to 28476608 (0x1b284c0), which I
showed before, so numbers are apparently not random.

> I had to reboot fruitcrow last night because it had crashed, but that
> was the first time in literally weeks. I tend to reboot it once a week,
> but otherwise it ran pretty stable.

Today I also tried to test my machine with stress-ng:
stress-ng -v --class os --sequential 20 --timeout 120s

It hanged/crashed at tests access, brk, close, enosys and never reached
the end... Some tests might pass after restart, some fail consistently...
For example:
Fatal glibc error: ../sysdeps/mach/hurd/mig-reply.c:73 (__mig_dealloc_reply_port): assertion failed: port == arg
stress-ng: info:  [9395] stressor terminated with unexpected signal 6 'SIGABRT'
backtrace:
  stress-ng-enosys [run](+0xace81) [0x1000ace81]
  stress-ng-enosys [run](+0x927b6c) [0x100927b6c]
  /lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x1029c8fee]
  /lib/x86_64-gnu/libc.so.0.3(+0x21aec) [0x1029b0aec]

> It took me a while to get there though before I applied for it to be a
> buildfarm animal, here is what I did:
>
> 1) (builfarm client specific): removed "HEAD => ['debug_parallel_query =
> regress']," and set "MAX_CONNECTIONS => '3'," in build-farm.conf, to
> reduce concurrency.

Thank you for the info! I didn't specify debug_parallel_query for
`make check`, but num_connections really makes the difference.

> 2. Gave it 4G of memory to the VM via KVM. Also set -M q35, but I guess
> you are already doing that as it does not boot properly otherwise IME.

Mine has 4GB too.

> 3. Removed swap (this is already the case for the x86-64 2025 Debian
> image, but it was not the case for the earlier 2023 i386 image when I
> started this project). Paging to disk has been problematic and prone to
> issues (critical parts getting paged out accidently), but this has been
> fixed over the summer so in principle running a current gnumach/hurd
> package combination from unstable should be fine again.

Yes, I have no swap enabled.

> 4. Removed tmpfs translators (so that the default-pager is not used
> anywhere, in conjunction with not setting swap, see above), by setting
> RAMLOCK=no and RAMTMP=no in /etc/default/tmpfs, as well as commenting
> out 'mount_run mount_noupdate'/'mount_tmp mount_noupdate' in
> /etc/init.d/mountall.sh and 'mount_run "$MNTMODE"' in
> /etc/init.d/mountkernfs.sh (maybe there is a more minimal change, but
> that is what I have right now).

I have RAMLOCK=no and RAMTMP=no in my /etc/default/tmpfs and can't see any
tmpfs mounts.

Thank you for your help!

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Burd 2025-10-12 13:10:27 Re: IO in wrong state on riscv64
Previous Message Tatsuo Ishii 2025-10-12 12:05:30 Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options