Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-01-09 01:17:16
Message-ID: 1609152.1641691036@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> So for some reason, on these machines detection of walsender-initiated
> connection close is unreliable ... or maybe, the walsender didn't close
> the connection, but is somehow still hanging around? Don't have much idea
> where to dig beyond that, but maybe someone else will. I wonder in
> particular if this could be related to our recent discussions about
> whether to use shutdown(2) on Windows --- could we need to do the
> equivalent of 6051857fc/ed52c3707 on walsender connections?

... wait a minute. After some more study of the buildfarm logs,
it was brought home to me that these failures started happening
just after 6051857fc went in:

https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=90&branch=&member=&stage=module-commit_tsCheck&filter=Submit

The oldest matching failure is jacana's on 2021-12-03.
(The above sweep finds an unrelated-looking failure on 2021-11-11,
but no others before 6051857fc went in on 2021-12-02. Also, it
looks likely that ed52c3707 on 2021-12-07 made the failure more
probable, because jacana's is the only matching failure before 12-07.)

So I'm now thinking it's highly likely that those commits are
causing it somehow, but how?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2022-01-09 01:52:02 Re: null iv parameter passed to combo_init()
Previous Message Tom Lane 2022-01-09 00:49:59 Re: Multiple Query IDs for a rewritten parse tree