Re: Race condition in server-crash testing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in server-crash testing
Date: 2022-04-04 05:07:21
Message-ID: 20220404050721.sktsuearncwjo6hr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-04-04 00:50:27 -0400, Tom Lane wrote:
> My pet dinosaur gaur just failed [1] in
> src/test/recovery/t/022_crash_temp_files.pl, which does this:
>
> -----
> my $ret = PostgreSQL::Test::Utils::system_log('pg_ctl', 'kill', 'KILL', $pid);
> is($ret, 0, 'killed process with KILL');
>
> # Close psql session
> $killme->finish;
> $killme2->finish;
>
> # Wait till server restarts
> $node->poll_query_until('postgres', undef, '');
> -----
>
> It's hard to be totally sure, but I think what happened is that
> gaur hit the in-hindsight-obvious race condition in this code:
> we managed to execute a successful iteration of poll_query_until
> before the postmaster had noticed its dead child and commenced
> the restart. The test lines after these are not prepared to see
> failure-to-connect.
>
> It's not obvious to me how to remove this race condition.
> Thoughts?

Maybe we can use pump_until() with the psql that's not getting killed? With a
non-matching regex? That'd only return once the backend was killed by
postmaster, afaics?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-04-04 05:45:23 Re: Extensible Rmgr for Table AMs
Previous Message Tom Lane 2022-04-04 04:50:27 Race condition in server-crash testing