Re: Windows vs recovery tests

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windows vs recovery tests
Date: 2022-01-13 02:25:26
Message-ID: 20220113022526.b63vclpbqlrm7aj2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-01-12 15:58:26 -0800, Andres Freund wrote:
> On 2022-01-12 14:34:00 -0500, Andrew Dunstan wrote:
> > For some considerable time the recovery tests have been at best flaky on
> > Windows, and at worst disastrous (i.e. they can hang rather than just
> > fail). It's a problem I worked around on my buildfarm animals by
> > disabling the tests, hoping to find time to get back to analysing the
> > problem. But now we are seeing failures on the cfbot too (e.g.
> > https://cirrus-ci.com/task/5860692694663168 and
> > https://cirrus-ci.com/task/5316745152954368 ) so I think we need to
> > spend some effort on finding out what's going on here.
>
> I'm somewhat certain that this is caused by assertions or aborts hanging with
> a GUI popup, e.g. due to a check in the CRT.

Oh, that was only about https://cirrus-ci.com/task/5860692694663168 not
https://cirrus-ci.com/task/5316745152954368

Looking through the recent recovery failures that were just on windows, I see
three different "classes" of recovery test failures:

1) Tests sometimes never finish, resulting in CI timing out
2) Tests sometimes finish, but t/001_stream_rep.pl fails
3) Tests fail with patch specific issues (e.g. 36/2096, 36/3461, 36/3459)

From the cases I looked the failures in 1) always have a successful
t/001_stream_rep.pl. This makes me think that we're likely at two separate
types of problems?

One might think that
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3464
conflicts with the above grouping. But all but the currently last failure were
due a compiler warning in an earlier version of the patch.

There's one interesting patch that also times out just on windows, albeit in
another test group:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/2096

This IMO looks likely to be a bug in psql introduced by that patch.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2022-01-13 02:49:33 Re: null iv parameter passed to combo_init()
Previous Message tanghy.fnst@fujitsu.com 2022-01-13 01:07:49 RE: Skipping logical replication transactions on subscriber side