Re: interrupted tap tests leave postgres instances around

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: interrupted tap tests leave postgres instances around
Date: 2022-10-01 20:38:21
Message-ID: 20221001203821.dpme5lsystrnhe5t@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-09-30 11:17:00 +0200, Alvaro Herrera wrote:
> But on testing, some nodes linger after being sent a shutdown signal.
> I'm not clear why this is -- I think it's due to the fact that we send
> the signal just as the node is starting up, which means the signal
> doesn't reach the process.

I suspect it's when a test gets interrupt while pg_ctl is starting the
backend. The start() routine only does _update_pid() after pg_ctl finished,
and terminate()->stop() returns before doing anything if pid isn't defined.

Perhaps the END{} routine should call $node->_update_pid(-1); if $exit_code !=
0 and _pid is undefined?

That does seem to reduce the incidence of "leftover" postgres
instances. 001_start_stop.pl leaves some behind, but that makes sense, because
it's bypassing the whole node management. But I still occasionally see some
remaining processes if I crank up test concurrency.

Ah! At least part of the problem is that sub stop() does BAIL_OUT, and of
course it can fail as part of the shutdown.

But there's still some that survive, where your perl.trace doesn't contain the
node getting shut down...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2022-10-01 20:58:28 Re: Question: test "aggregates" failed in 32-bit machine
Previous Message Noah Misch 2022-10-01 19:59:55 Re: pg_regress: lookup shellprog in $PATH