Re: Use durable_unlink for .ready and .done files for WAL segment removal

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use durable_unlink for .ready and .done files for WAL segment removal
Date: 2018-12-04 20:40:40
Message-ID: 4B2903CE-B6A4-4635-9921-AB74E317A358@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/4/18, 1:36 AM, "Michael Paquier" <michael(at)paquier(dot)xyz> wrote:
> Okay, here is an updated patch for this stuff, which does the following:
> - Check for a WAL segment if it has a ".ready" status file, an orphaned
> status file is removed only on ENOENT.
> - If durable_unlink fails, retry 3 times. If there are too many
> failures, the archiver gives up on the orphan status file removal. If
> the removal works correctly, the archiver moves on to the next file.

Thanks for the updated patch! The code looks good to me, the patch
applies cleanly and builds without warnings, and it seems to work well
in my manual tests. I just have a few wording suggestions.

+ /*
+ * In the event of a system crash, archive status files may be
+ * left behind as their removals are not durable, cleaning up
+ * orphan entries here is the cheapest method. So check that
+ * the segment trying to be archived still exists. If it does
+ * not, its orphan status file is removed.
+ */

I would phrase this comment this way:

Since archive_status files are not durably removed, a system
crash could leave behind .ready files for WAL segments that
have already been recycled or removed. In this case, simply
remove the orphan status file and move on.

+ ereport(WARNING,
+ (errmsg("removed orphan archive status file %s",
+ xlogready)));

I think we should put quotes around the file name like we do elsewhere
in pgarch_ArchiverCopyLoop().

+ ereport(WARNING,
+ (errmsg("failed removal of \"%s\" too many times, will try again later",
+ xlogready)));

I'd suggest mirroring the log statement for failed archiving commands
and saying something like, "removing orphan archive status file \"%s\"
failed too many times, will try again later." IMO that makes it
clearer what is failing and why we are removing it in the first place.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-04 20:45:39 Re: reducing isolation tests runtime
Previous Message Joshua D. Drake 2018-12-04 20:39:02 Re: make install getting slower