Re: pg_upgrade test failure

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: pg_upgrade test failure
Date: 2023-02-01 01:44:53
Message-ID: CA+hUKGJMynw7BNpsaF3c7hPBEBpzP8T_8WbO-93s=XzcoEsoxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, Feb 1, 2023 at 10:08 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Wed, Feb 1, 2023 at 10:04 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Maybe we should just handle it by sleeping and retrying, if on windows? Sad to even propose...
>
> Yeah, that's what that code I posted would do automatically, though
> it's a bit hidden. The second attempt to unlink() would see delete
> already pending, and activate its secret internal sleep/retry loop.

OK, I pushed that. Third time lucky?

I tracked down the discussion of that existing comment about pg_ctl,
which comes from the 9.2 days:

https://www.postgresql.org/message-id/flat/5044DE59.5020500%40dunslane.net

I guess maybe back then fopen() was Windows' own fopen() that wouldn't
allow two handles to a file at the same time? These days we redirect
it to a wrapper with the magic "shared" flags, so the kluge installed
by commit f8c81c5dde2 may not even be needed anymore. It does
demonstrate that there are long standing timing races around log
files, process exit and wait-for-shutdown logic, though.

Someone who develops for Windows could probably chase this right down,
and make sure that we do certain things in the right order, and/or
find better kernel facilities; at a wild guess, something like
OpenProcess() before you initiate shutdown, so you can then wait on
its handle, for example. The docs for ExitProcess() make it clear
that handles are synchronously closed, so I think it's probably just
that our tests for when processes have exited are too fuzzy.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2023-02-01 02:14:35 pgsql: dblink: Fix variable confusion introduced in e4602483e95
Previous Message Thomas Munro 2023-02-01 01:42:36 pgsql: Try to fix pg_upgrade test on Windows, again.

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-02-01 01:53:17 Weird failure with latches in curculio on v15
Previous Message Tom Lane 2023-02-01 01:37:29 Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?