PostgreSQL occasionally unable to rename WAL files (NTFS)

From: Guy Burgess <guy(at)burgess(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: PostgreSQL occasionally unable to rename WAL files (NTFS)
Date: 2021-02-11 00:21:12
Message-ID: 095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

Running 13.1 on Windows Server 2019, I am getting the following log
entries occasionally:

    2021-02-11 12:34:10.149 NZDT [6072] LOG:  could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:40:31.377 NZDT [6072] LOG:  could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:46:06.294 NZDT [6072] LOG:  could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:46:16.502 NZDT [6072] LOG:  could not rename file
"pg_wal/0000000100000099000000DA": Permission denied
    2021-02-11 12:50:20.917 NZDT [6072] LOG:  could not rename file
"pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:50:31.098 NZDT [6072] LOG:  could not rename file
"pg_wal/0000000100000099000000DA": Permission denied

What appears to be happening is the affected WAL files (which is usually
only 2 or 3 WAL files at a time) are somehow "losing" their NTFS
permissions, so the PG process can't rename them - though of course the
PG process created them. Even running icacls as admin gives "Access is
denied" on those files. A further oddity is the affected files do end up
disappearing after a while.

The NTFS permissions on the pg_wal directory are correct, and most WAL
files are unaffected. Chkdsk reports no problems, and the database is
working fine otherwise. Have tried disabling antivirus software in case
that was doing something but no difference.

I found another recent report of similar behaviour here:
https://stackoverflow.com/questions/65405479/postgresql-13-log-could-not-rename-file-pg-wal-0000000100000001000000c6

WAL config as follows:

wal_level = replica
fsync = on
synchronous_commit = on
wal_sync_method = fsync
full_page_writes = on
wal_compression = off
wal_log_hints = off
wal_init_zero = on
wal_recycle = on
wal_buffers = -1
wal_writer_delay = 200ms
wal_writer_flush_after = 1MB
wal_skip_threshold = 2MB
commit_delay = 0
commit_siblings = 5
checkpoint_timeout = 5min
max_wal_size = 2GB
min_wal_size = 256MB
checkpoint_completion_target = 0.7
checkpoint_flush_after = 0
checkpoint_warning = 30s
archive_mode = off

I'm thinking of disabling wal_recycle as a first step to see if that
makes any difference, but thought I'd seek some comments first.

Not sure how much of a problem this is - the database is running fine
otherwise - but any thoughts would be appreciated.

Thanks & regards,

Guy

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jagmohan Kaintura 2021-02-11 01:17:10 Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
Previous Message Brajendra Pratap 2021-02-10 23:39:56 Unable to execute Query in parallel for partitioned table