pgsql: Don't error out if recycling or removing an old WAL segment fails

From: heikki(at)postgresql(dot)org (Heikki Linnakangas)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Don't error out if recycling or removing an old WAL segment fails
Date: 2009-09-13 18:32:17
Message-ID: 20090913183217.9E30D753FB7@cvs.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Log Message:
-----------
Don't error out if recycling or removing an old WAL segment fails at the end
of checkpoint. Although the checkpoint has been written to WAL at that point
already, so that all data is safe, and we'll retry removing the WAL segment at
the next checkpoint, if such a failure persists we won't be able to remove any
other old WAL segments either and will eventually run out of disk space. It's
better to treat the failure as non-fatal, and move on to clean any other WAL
segment and continue with any other end-of-checkpoint cleanup.

We don't normally expect any such failures, but on Windows it can happen with
some anti-virus or backup software that lock files without FILE_SHARE_DELETE
flag.

Also, the loop in pgrename() to retry when the file is locked was broken. If a
file is locked on Windows, you get ERROR_SHARE_VIOLATION, not
ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left
the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct
in some environment), and added ERROR_LOCK_VIOLATION to be consistent with
similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to
10s, on the grounds that since it's been broken, we've effectively had a
timeout of 0s and no-one has complained, so a smaller timeout is actually
closer to the old behavior. A longer timeout would mean that if recycling a
WAL file fails because it's locked for some reason, InstallXLogFileSegment()
will hold ControlFileLock for longer, potentially blocking other backends, so
a long timeout isn't totally harmless.

While we're at it, set errno correctly in pgrename().

Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c
changes would make sense on other platforms and thus on older versions as
well, but since there's no such locking issues on other platforms, it's not
worth it.

Tags:
----
REL8_4_STABLE

Modified Files:
--------------
pgsql/src/backend/access/transam:
xlog.c (r1.345.2.4 -> r1.345.2.5)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.345.2.4&r2=1.345.2.5)
pgsql/src/port:
dirmod.c (r1.58 -> r1.58.2.1)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/port/dirmod.c?r1=1.58&r2=1.58.2.1)

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-09-13 18:32:27 pgsql: Don't error out if recycling or removing an old WAL segment fails
Previous Message Heikki Linnakangas 2009-09-13 18:32:08 pgsql: Don't error out if recycling or removing an old WAL segment fails