Re: Strange failure on mamba

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange failure on mamba
Date: 2022-11-17 22:47:50
Message-ID: 2055732.1668725270@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Fri, Nov 18, 2022 at 11:08 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> mamba has been showing intermittent failures in various replication
>> tests since day one.

> I wonder if it's a runtime variant of the other problem. We do
> load_file("libpqwalreceiver", false) before unblocking signals but
> maybe don't resolve the symbols until calling them, or something like
> that...

Yeah, that or some other NetBSD bug could be the explanation, too.
Without a stack trace it's hard to have any confidence about it,
but I've been unable to reproduce the problem outside the buildfarm.
(Which is a familiar refrain. I wonder what it is about the buildfarm
environment that makes it act different from the exact same code running
on the exact same machine.)

So I'd like to have some way to make the postmaster send SIGABRT instead
of SIGKILL in the buildfarm environment. The lowest-tech way would be
to drive that off some #define or other. We could scale it up to a GUC
perhaps. Adjacent to that, I also wonder whether SIGABRT wouldn't be
more useful than SIGSTOP for the existing SendStop half-a-feature ---
the idea that people should collect cores manually seems mighty
last-century.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-11-17 22:57:12 Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
Previous Message Thomas Munro 2022-11-17 22:47:48 Re: Strange failure on mamba