Re: .ready and .done files considered harmful

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "dipesh(dot)pandit(at)gmail(dot)com" <dipesh(dot)pandit(at)gmail(dot)com>, "jeevan(dot)ladhe(at)enterprisedb(dot)com" <jeevan(dot)ladhe(at)enterprisedb(dot)com>, "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "hannuk(at)google(dot)com" <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-09-07 18:13:39
Message-ID: B77F8791-947C-4FF5-8D82-BA251E1B9F9D@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/7/21, 10:54 AM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
> I guess what I don't understand about the multiple-files-per-dirctory
> scan implementation is what happens when something happens that would
> require the keep-trying-the-next-file approach to perform a forced
> scan. It seems to me that you still need to force an immediate full
> scan, because if the idea is that you want to, say, prioritize
> archiving of new timeline files over any others, a cached list of
> files that you should archive next doesn't accomplish that, just like
> keeping on trying the next file in sequence doesn't accomplish that.

Right. The latest patch for that approach [0] does just that. In
fact, I think timeline files are the only files for which we need to
force an immediate directory scan in the multiple-files-per-scan
approach. For the keep-trying-the-next-file approach, we have to
force a directory scan for anything but a regular WAL file that is
ahead of our archiver state.

> So I'm wondering if in the end the two approaches converge somewhat,
> so that with either patch you get (1) some kind of optimization to
> scan the directory less often, plus (2) some kind of notification
> mechanism to tell you when you need to avoid applying that
> optimization. If you wanted to, (1) could even include both batching
> and then, when the batch is exhausted, trying files in sequence. I'm
> not saying that's the way to go, but you could. In the end, it seems
> less important that we do any particular thing here and more important
> that we do something - but if prioritizing timeline history files is
> important, then we have to preserve that behavior.

Yeah, I would agree that the approaches basically converge into some
form of "do fewer directory scans."

Nathan

[0] https://www.postgresql.org/message-id/attachment/125980/0001-Improve-performance-of-pgarch_readyXlog-with-many-st.patch

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-09-07 18:30:23 Re: .ready and .done files considered harmful
Previous Message Robert Haas 2021-09-07 18:13:05 Re: VARDATA_COMPRESSED_GET_COMPRESS_METHOD comment?