Re: .ready and .done files considered harmful

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Andres Freund" <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-08-24 17:26:20
Message-ID: 2F2E38BE-B337-456A-A5E2-6E387BD5C1CC@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/24/21, 5:31 AM, "Dipesh Pandit" <dipesh(dot)pandit(at)gmail(dot)com> wrote:
>> > I've been looking at the v9 patch with fresh eyes, and I still think
>> > we should be able to force the directory scan as needed in
>> > XLogArchiveNotify(). Unless the file to archive is a regular WAL file
>> > that is > our stored location in archiver memory, we should force a
>> > directory scan. I think it needs to be > instead of >= because we
>> > don't know if the archiver has just completed a directory scan and
>> > found a later segment to use to update the archiver state (but hasn't
>> > yet updated the state in shared memory).
>>
>> I'm afraid that it can be seen as a violation of modularity. I feel
>> that wal-emitter side should not be aware of that datail of
>> archiving. Instead, I would prefer to keep directory scan as far as it
>> found an smaller segment id than the next-expected segment id ever
>> archived by the fast-path (if possible). This would be
>> less-performant in the case out-of-order segments are frequent but I
>> think the overall objective of the original patch will be kept.
>
> Archiver selects the file with lowest segment number as part of directory
> scan and the next segment number gets resets based on this file. It starts
> a new sequence from here and check the availability of the next file. If
> there are holes then it will continue to fall back to directory scan. This will
> continue until it finds the next sequence in order. I think this is already
> handled unless I am missing something.

I'm thinking of the following scenario:
1. Status file 2.ready is created.
2. Archiver finds 2.ready and uses it to update its state.
3. Status file 1.ready is created.

At this point, the archiver will look for 3.ready next. If it finds
3.ready, it'll look for 4.ready. Let's say it keeps finding status
files up until 1000000.ready. In this case, the archiver won't go
back and archive segment 1 until we've archived ~1M files. I'll admit
this is a contrived example, but I think it demonstrates how certain
assumptions could fail with this approach.

I think Horiguchi-san made a good point that the .ready file creators
should ideally not need to understand archiving details. However, I
think this approach requires them to be inextricably linked. In the
happy case, the archiver will follow the simple path of processing
each consecutive WAL file without incurring a directory scan. Any
time there is something other than a regular WAL file to archive, we
need to take special action to make sure it is picked up.

This sort of problem doesn't really show up in the always-use-
directory-scan approaches. If you imagine the .ready file creators as
throwing status files over a fence at random times and in no
particular order, directory scans are ideal because you are
essentially starting with a clean slate each time. The logic to
prioritize timeline history files is nice to have, but even if it
wasn't there, the archiver would still pick it up eventually. IOW
there's no situation (except perhaps infinite timeline history file
generation) that puts us in danger of skipping files indefinitely.
Even if we started creating a completely new type of status file, the
directory scan approaches would probably work without any changes.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-08-24 17:30:08 Re: Queries that should be canceled will get stuck on secure_write function
Previous Message Robert Haas 2021-08-24 17:26:05 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)