Re: Use durable_unlink for .ready and .done files for WAL segment removal

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use durable_unlink for .ready and .done files for WAL segment removal
Date: 2018-11-27 20:43:06
Message-ID: 35A15B11-2A0F-4DDE-81CD-CD00C6538512@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/21/18, 10:16 PM, "Michael Paquier" <michael(at)paquier(dot)xyz> wrote:
>> At Fri, 02 Nov 2018 14:47:08 +0000, Nathan Bossart
>> <bossartn(at)amazon(dot)com> wrote in
>> <154117002849(dot)5569(dot)14588306221618961668(dot)pgcf(at)coridan(dot)postgresql(dot)org>:
>>> Granted, any added delay from this patch is unlikely to be noticeable
>>> unless your archiver is way behind and archive_status has a huge
>>> number of files. However, I have seen cases where startup is stuck on
>>> other tasks like SyncDataDirectory() and RemovePgTempFiles() for a
>>> very long time, so perhaps it is worth considering.
>
> What's the scale of the pg_wal partition and the amount of time things
> were stuck? I would imagine that the sync phase hurts the most, and a
> fast startup time for crash recovery is always important.

I don't have exact figures to share, but yes, a huge number of calls
to sync_file_range() and fsync() can use up a lot of time. Presumably
Postgres processes files individually instead of using sync() because
sync() may return before writing is done. Also, sync() would affect
non-Postgres files. However, it looks like Linux actually does wait
for writing to complete before returning from sync() [0].

For RemovePgTempFiles(), the documentation above the function
indicates that skipping temp file cleanup during startup would
actually be okay because collisions with existing temp file names
should be handled by OpenTemporaryFile(). I assume this cleanup is
done during startup because there isn't a great alternative besides
offloading the work to a new background worker or something.

On 11/27/18, 6:35 AM, "Michael Paquier" <michael(at)paquier(dot)xyz> wrote:
> Attached is a patch showing shaped based on the idea of upthread.
> Thoughts?

I took a look at this patch.

+ /*
+ * In the event of a system crash, archive status files may be
+ * left behind as their removals are not durable, cleaning up
+ * orphan entries here is the cheapest method. So check that
+ * the segment trying to be archived still exists.
+ */
+ snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+ if (stat(pathname, &stat_buf) != 0)
+ {

Don't we also need to check that errno is ENOENT here?

+ StatusFilePath(xlogready, xlog, ".ready");
+ if (durable_unlink(xlogready, WARNING) == 0)
+ ereport(WARNING,
+ (errmsg("removed orphan archive status file %s",
+ xlogready)));
+ return;

IIUC any time that the file does not exist, we will attempt to unlink
it. Regardless of whether unlinking fails or succeeds, we then
proceed to give up archiving for now, but it's not clear why. Perhaps
we should retry unlinking a number of times (like we do for
pgarch_archiveXlog()) when durable_unlink() fails and simply "break"
to move on to the next .ready file if durable_unlink() succeeds.

Nathan

[0] http://man7.org/linux/man-pages/man2/sync.2.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-11-27 20:45:57 Re: Use durable_unlink for .ready and .done files for WAL segment removal
Previous Message Sanyo Moura 2018-11-27 20:30:04 Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0