Re: buildfarm windows checks / tap tests on windows

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: buildfarm windows checks / tap tests on windows
Date: 2021-03-03 05:20:11
Message-ID: 20210303052011.ybplxw6q4tafwogk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-03-02 12:57:57 -0800, Andres Freund wrote:
> t/003_recovery_targets.pl ............ 7/9
> # Failed test 'multiple conflicting settings'
> # at t/003_recovery_targets.pl line 151.
>
> # Failed test 'recovery end before target reached is a fatal error'
> # at t/003_recovery_targets.pl line 177.
> t/003_recovery_targets.pl ............ 9/9 # Looks like you failed 2 tests of 9.
> t/003_recovery_targets.pl ............ Dubious, test returned 2 (wstat 512, 0x200)
> Failed 2/9 subtests

This appears to be caused by stderr in windows docker containers to
somehow not work quite right. cirrus-ci uses docker on windows.

If you look e.g. at https://cirrus-ci.com/task/6111560255930368, and
specifically at the relevant log file:
https://api.cirrus-ci.com/v1/artifact/task/6111560255930368/log/src/test/recovery/tmp_check/log/003_recovery_targets_primary.log
you can see that it's, uh, less full than we normally expect:
1 file(s) copied.
1 file(s) copied.
1 file(s) copied.
1 file(s) copied.

As that test uses the log file to determine the state of servers:
> my $logfile = slurp_file($node_standby->logfile());
> ok($logfile =~ qr/multiple recovery targets specified/,
> 'multiple conflicting settings');

that doesn't work.

I was *very* confused by this for a while. But finally the cluebait hit
when I discovered that stderr works just fine for *other*
programs. Including the programs that evidently log into
003_recovery_targets_primary.log. The problem is that
pgwin32_is_service() somehow decides that postgres is running as a
service. Despite that not really being the case (I guess somehow
internally docker containers are started below a service, and that
causes the problem).

I hate everything right now. So much.

I think it's quite nasty that postgres just silently starts to log to
the event log. Why on earth wasn't the solution instead to hardcode that
as a server parameter in pg_ctl register?

Not sure what a good fix is for this.

The second problem I saw was 001_initdb failing, which appears to have
been caused by some weird permission issue that I don't fully
understand. The directory with PG in it was created by user andres, an
administrator. But somehow the inherited permissions lead to the chmod()
that initdb does ("fixing permissions on existing directory %s ...") to
fail.

c:\src\postgres>icacls c:\src\postgres
c:\src\postgres BUILTIN\Administrators:(F)
BUILTIN\Administrators:(I)(OI)(CI)(F)
NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
CREATOR OWNER:(I)(OI)(CI)(IO)(F)
BUILTIN\Users:(I)(OI)(CI)(RX)
BUILTIN\Users:(I)(CI)(AD)
BUILTIN\Users:(I)(CI)(WD)
c:\src\postgres>whoami
andres-build-te\andres

c:\src\postgres>net user andres
User name andres
...
Local Group Memberships *Administrators *Users

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-03-03 05:23:13 Re: pgbench: option delaying queries till connections establishment?
Previous Message Masahiko Sawada 2021-03-03 04:49:08 Re: New IndexAM API controlling index vacuum strategies