Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Luke Koops <luke(dot)koops(at)entrust(dot)com>
Cc: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.
Date: 2009-09-09 18:58:47
Message-ID: 4AA7FAE7.5040707@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Luke Koops wrote:
> For those of you who are still looking at this, I tried to reproduce the issue by holding one of the WAL files open with another program (just opened it with the cygwin build of less.exe for windows). That didn't do the trick. It prevented unlink or rename from working at all. I wrote a program (open.exe) that opens the file using pgwin32_open() and passed in the same parameters that postgres uses when opening a WAL file. That allowed the file to be renamed. And, when deleted, the open file went into the pending deletion state.

Yeah, it's the FILE_SHARE_DELETE flag that allows the deletion.

> I used open.exe to hold onto a WAL file that was going to be recycled. The recycling worked, but what is going to happen down the road when the handle is released, leaving a gap in the WAL file sequence. Or if it is not released, when a backend tries to open the WAL file and does not have access to it?

When the file is recycled, I believe we're fine. The file is not
deleted, only renamed, so it won't be deleted when open.exe closes it.
No gap in WAL sequence is created.

> When open.exe was holding onto a WAL file that was going to be deleted, the deletion worked. The file went into the deletion pending state. The archive status for the WAL file went through the .ready ==> .done ==> {no status file} ==> .ready sequence. At that point Postgres repeatedly tries to archive the WAL file.

> I reported earlier that I believe postgres leaked the file handle to the WAL file. I don't believe that is the case. We have a process that only checks data in the database for integrity. It is only reading. I think it opened the WAL file initially, perhaps the backend had some maintenance work to do when that session started and had to write something to the WAL and then never moved on to a new one.
>
> Now that I can reproduce the pending deletion case, I'm working on code to detect it reliably and, hopefully, efficiently.

I got hold of a Windows virtual machine as well, and could reproduce the
issue. It was a bit tricky to coerce the file to be deleted instead of
recycled, but setting "max_advance = 0" in RemoveOldXlogFiles() finally
did the trick.

I googled around, and saw some discussion that suggest that when a file
is in "pending deletion" state, it's implemented by setting a
"delete-on-close" flag on the existing file handle. The upshot of that
is that if you pull the power plug, the file won't be deleted after all.

One option is to rename the file before deleting it. For all practical
purposes, that's the same as if the file no longer exists. Seems like
the simplest solution to me.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-09-09 19:06:05 Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.
Previous Message Keh-Cheng Chu 2009-09-09 18:51:52 need higher extra_float_digits value (3)