Re: Strange failure on mamba

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange failure on mamba
Date: 2022-11-17 22:08:09
Message-ID: 2051761.1668722889@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> I wonder why the walreceiver didn't start in
> 008_min_recovery_point_node_3.log here:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-11-16%2023%3A13%3A38

mamba has been showing intermittent failures in various replication
tests since day one. My guess is that it's slow enough to be
particularly subject to the signal-handler race conditions that we
know exist in walreceivers and elsewhere. (Now, it wasn't any faster
in its previous incarnation as a macOS critter. But maybe modern
NetBSD has different scheduler behavior than ancient macOS and that
contributes somehow. Or maybe there's some other NetBSD weirdness
in here.)

I've tried to reproduce manually, without much success :-(

Like many of its other failures, there's a suggestive postmaster
log entry at the very end:

2022-11-16 19:45:53.851 EST [2036:4] LOG: received immediate shutdown request
2022-11-16 19:45:58.873 EST [2036:5] LOG: issuing SIGKILL to recalcitrant children
2022-11-16 19:45:58.881 EST [2036:6] LOG: database system is shut down

So some postmaster child is stuck somewhere where it's not responding
to SIGQUIT. While it's not unreasonable to guess that that's a
walreceiver, there's no hard evidence of it here. I've been wondering
if it'd be worth patching the postmaster so that it's a bit more verbose
about which children it had to SIGKILL. I've also wondered about
changing the SIGKILL to SIGABRT in hopes of reaping a core file that
could be investigated.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-11-17 22:11:00 Re: Fix proposal for comparaison bugs in PostgreSQL::Version
Previous Message Cary Huang 2022-11-17 22:01:19 Patch: Global Unique Index