Re: .ready and .done files considered harmful

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-08-24 00:03:37
Message-ID: 78CCF003-F9A8-4640-95A3-52FA364CAF18@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/23/21, 10:49 AM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Aug 23, 2021 at 11:50 AM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>> To handle a "cheating" archive command, I'd probably need to add a
>> stat() for every time pgarch_readyXLog() returned something from
>> arch_files. I suspect something similar might be needed in Dipesh's
>> patch to handle backup history files and partial WAL files.
>
> I think he's effectively got that already, although it's probably
> inside of pgarch_readyXLog(). The idea there is that instead of having
> a cache of files to be returned (as in your case) he just checks
> whether the next file in sequence happens to be present and if so
> returns that file name. To see whether it's present, he uses stat().

IIUC partial WAL files are handled because the next file in the
sequence with the given TimeLineID won't be there, so we will fall
back to a directory scan and pick it up. Timeline history files are
handled by forcing a directory scan, which should work because they
always have the highest priority. Backup history files, however, do
not seem to be handled. I think one approach to fixing that is to
also treat backup history files similarly to timeline history files.
If one is created, we force a directory scan, and the directory scan
logic will consider backup history files as higher priority than
everything but timeline history files.

I've been looking at the v9 patch with fresh eyes, and I still think
we should be able to force the directory scan as needed in
XLogArchiveNotify(). Unless the file to archive is a regular WAL file
that is > our stored location in archiver memory, we should force a
directory scan. I think it needs to be > instead of >= because we
don't know if the archiver has just completed a directory scan and
found a later segment to use to update the archiver state (but hasn't
yet updated the state in shared memory).

Also, I think we need to make sure to set PgArch->dirScan back to true
at the end of pgarch_readyXlog() unless we've found a new regular WAL
file that we can use to reset the archiver's stored location. This
ensures that we'll keep doing directory scans as long as there are
timeline/backup history files to process.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-08-24 00:29:09 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Previous Message Alvaro Herrera 2021-08-23 22:52:17 prevent immature WAL streaming