Re: Race condition in crash-recovery tests

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, mikael(dot)kjellstrom(at)gmail(dot)com
Subject: Re: Race condition in crash-recovery tests
Date: 2019-01-27 02:45:11
Message-ID: 21941.1548557111@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-01-26 20:53:48 -0500, Tom Lane wrote:
>> I have no idea why we're seeing this in only one buildfarm member
>> and only for the past week or so, as it doesn't appear that any
>> related code has changed for months. (Perhaps something changed
>> about curculio's host?)

> I have no idea why it's just curculio, but I think I know why it only
> started recently: Curculio doesn't appear to have tap tests enabled
> before
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2019-01-17%2021%3A30%3A02

Oh, right ... I knew that, actually, but forgot ...

So then we only have to assume that the race condition is encouraged
by something about the kernel scheduler's rules on that machine, which
isn't so much of a leap, especially since it's our only OpenBSD
critter. The test case only exists in v11 and HEAD branches, and
curculio's only run this test a few times in v11, so the lack of
back-branch failures isn't so odd.

>> just change the test script to accept either message as a successful
>> result. I think that 4247db625 made such races more likely, but I
>> don't believe it was impossible before.

> Sounds right to me - do you want to do the honors or shall I?

I'll do it in a bit.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-27 02:46:50 Re: Variable-length FunctionCallInfoData
Previous Message Andres Freund 2019-01-27 02:32:53 Re: Variable-length FunctionCallInfoData