Re: .ready and .done files considered harmful

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Andres Freund" <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-09-16 23:26:55
Message-ID: C472F58F-4E9B-49C2-82FF-2E373ADEFE34@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/15/21, 4:28 AM, "Dipesh Pandit" <dipesh(dot)pandit(at)gmail(dot)com> wrote:
> I have incorporated these changes and updated a new patch. PFA patch.

Thanks for the new patch.

I've attached my take on the latest version. My main focus this time
was simplifying the patch for this approach as much as possible.
Specifically, I've done the following:

1. I've removed several calls to PgArchForceDirScan() in favor of
calling it at the top of pgarch_ArchiverCopyLoop(). I believe
there is some disagreement about this change, but I don't think
we gain enough to justify the complexity. The main reason we
exit pgarch_ArchiverCopyLoop() should ordinarily be that we've
run out of files to archive, so incurring a directory scan the
next time it is called doesn't seem like it would normally be too
bad. I'm sure there are exceptions (e.g., lots of .done files,
archive failures), but the patch is still not making things any
worse than they presently are for these cases.
2. I removed all the logic that attempted to catch out-of-order
.ready files. Instead, XLogArchiveNotify() only forces a
directory scan for files other than regular WAL files, and we
depend on our periodic directory scans to pick up anything that's
been left behind.
3. I moved the logic that forces directory scans every once in a
while. We were doing that in the checkpoint/restartpoint logic,
which, upon further thought, might not be the best idea. The
checkpoint interval can vary widely, and IIRC we won't bother
creating checkpoints at all if database activity stops. Instead,
I've added logic in pgarch_readyXlog() that forces a directory
scan if one hasn't happened in a few minutes.
4. Finally, I've tried to ensure comments are clear and that the
logic is generally easy to reason about.

What do you think?

Nathan

Attachment Content-Type Size
v6-0001-Do-fewer-directory-scans-of-archive_status.patch application/octet-stream 10.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sehrope Sarkuni 2021-09-17 00:59:54 Re: Add jsonlog log_destination for JSON server logs
Previous Message Ranier Vilela 2021-09-16 22:58:25 Re: Possible fault with resolve column name (plpgsql)