Re: .ready and .done files considered harmful

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-08-24 19:08:49
Message-ID: CA+TgmobswMqycLSJ7GVj3+oaGWqr_685TiHULqDCeH9RGLKOJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 24, 2021 at 1:26 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> I think Horiguchi-san made a good point that the .ready file creators
> should ideally not need to understand archiving details. However, I
> think this approach requires them to be inextricably linked. In the
> happy case, the archiver will follow the simple path of processing
> each consecutive WAL file without incurring a directory scan. Any
> time there is something other than a regular WAL file to archive, we
> need to take special action to make sure it is picked up.

I think they should be inextricably linked, really. If we know
something - like that there's a file ready to be archived - then it
seems like we should not throw that information away and force
somebody else to rediscover it through an expensive process. The whole
problem here comes from the fact that we're using the filesystem as an
IPC mechanism, and it's sometimes a very inefficient one.

I can't quite decide whether the problems we're worrying about here
are real issues or just kind of hypothetical. I mean, today, it seems
to be possible that we fail to mark some file ready for archiving,
emit a log message, and then a huge amount of time could go by before
we try again to mark it ready for archiving. Are the problems we're
talking about here objectively worse than that, or just different? Is
it a problem in practice, or just in theory?

I really want to avoid getting backed into a corner where we decide
that the status quo is the best we can do, because I'm pretty sure
that has to be the wrong conclusion. If we think that
get-a-bunch-of-files-per-readdir approach is better than the
keep-trying-the-next-file approach, I mean that's OK with me; I just
want to do something about this. I am not sure whether or not that's
the right course of action.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-08-24 19:12:45 Re: Mark all GUC variable as PGDLLIMPORT
Previous Message Chapman Flack 2021-08-24 18:52:23 Re: Mark all GUC variable as PGDLLIMPORT