Permission failures with WAL files in 13~ on Windows

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Permission failures with WAL files in 13~ on Windows
Date: 2021-03-16 07:20:37
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

There has been for the last couple of weeks a collection of reports
complaining that the renaming of WAL segments is broken:

These have happened on a variety of Windows versions, 2019 and 2012 R2
being mentioned when segments are recycled.

The number of those failures is alarming, and the information gathered
points at 13.1 and 13.2 as the culprits where those failures are
happening, so I'd like to believe that there is a regression in 13.
FWIW, I have also been doing some tests on my side, and while I as not
able to trigger the reported failure, I have been able to trigger the
same error with an archive_command doing a simple cp that failed
continuously on EACCES.

Fujii-san has mentioned that on twitter, but one area that has changed
during the v13 cycle is aaa3aed, where the code recycling segments has
been switched from a pgrename() (with a retry loop) to a
CreateHardLinkA()+pgunlink() (with a retry loop for the second). One
theory that I got in mind here is the case where we create the hard
link, but fail to finish do the pgunlink() on the xlogtemp.N file,
though after some testing it did not seem to have any impact.

I am running more tests with several scenarios (aggressive segment
recycling or segment rotation) to get more reproducible scenarios,
but I was wondering if anybody had ideas around that.

So, thoughts?


Browse pgsql-hackers by date

  From Date Subject
Next Message 2021-03-16 07:40:38 RE: libpq debug log
Previous Message vignesh C 2021-03-16 07:15:15 Re: subscriptionCheck failures