Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From: Guy Burgess <guy(at)burgess(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Cc: Thorsten Schöning <tschoening(at)am-soft(dot)de>
Subject: Re: PostgreSQL occasionally unable to rename WAL files (NTFS)
Date: 2021-02-15 10:52:07
Message-ID: f444a84e-2d29-55f9-51a6-a5dcea3bc253@burgess.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12/02/2021 4:33 am, Thorsten Schöning wrote:
> The behaviour you describe happens exactly when two processes e.g.
> concurrently hold HANDLEs on the same file and one of those deletes
> the file then. Windows keeps file names until all open HANDLEs are
> closed and depending on how those HANDLEs have been opened by the
> first app, concurrent deletion is perferctly fine for Windows.
>
> Though, a such deleted file can't be opened easily anymore and looks
> like it has lost permissions only. But that's not the case, it's
> deleted already. It might be that this happens for Postgres to itself
> somehow when some other app has an open HANDLE. I don't think that
> some other app is deleting that file by purpose instead, reading it
> for some reason seems more likely to me.

Using Process Monitor, Thorsten's explanation above appears to correctly
diagnose what is happening. ProcMon data shows postgres.exe performing
"CreateFile" operations on the affected WAL files, with the result
status "DELETE PENDING". Which according to
https://stackoverflow.com/a/29892104 means:

"Windows allows a process to delete a file, even though it is still
opened by another process (e.g. Windows indexing service or
Antivirus). It gets internally marked as "delete pending". The file
does not actually get removed from the file system, it is still
there after the File.Delete call. Anybody that tries to open the
file after that gets an access denied error. The file doesn't
actually get removed until the last handle to the file object gets
closed"

which is the same behaviour Thorsten describes above (great info, thanks
Thorsten).

The mystery now is that the only process logged as touching the affected
WAL files is postgres.exe (of which there are many separate processes).
Could it be that one of the postgres.exe instances is holding the
affected WAL files in use after another postgres.exe instance has
flagged the file as deleted? (or to put it the other way, a postgres.exe
instance is flagging the file as deleted while another instance still
has an open handle to the file)? If it is some other process such as the
indexer (disabled) or AV (excluded from pgdata) is obtaining a handle on
the WAL files, it isn't being logged by ProcMon.

Kind regards,

Guy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thorsten Schöning 2021-02-15 11:23:13 Re: PostgreSQL occasionally unable to rename WAL files (NTFS)
Previous Message Thomas Munro 2021-02-15 03:44:18 Re: How to post to this mailing list from a web based interface