Re: .ready and .done files considered harmful

From: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-08-25 11:11:03
Message-ID: CAN1g5_EhWvwmE=_b2sYYZOQF7QGO13D2A4Wd1gb9H2zJsO-rWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> If a .ready file is created out of order, the directory scan logic
> will pick it up about as soon as possible based on its priority. If
> the archiver is keeping up relatively well, there's a good chance such
> a file will have the highest archival priority and will be picked up
> the next time the archiver looks for a file to archive. With the
> patch proposed in this thread, an out-of-order .ready file has no such
> guarantee. As long as the archiver never has to fall back to a
> directory scan, it won't be archived. The proposed patch handles the
> case where RemoveOldXlogFiles() creates missing .ready files by
> forcing a directory scan, but I'm not sure this is enough. I think we
> have to check the archiver state each time we create a .ready file to
> see whether we're creating one out-of-order.

We can handle the scenario where .ready file is created out of order
in XLogArchiveNotify(). This way we can avoid making an explicit call
to enable directory scan from different code paths which may result
into creating an out of order .ready file.

Archiver can store the segment number corresponding to the last or most
recent .ready file found. When a .ready file is created in
XLogArchiveNotify(),
the log segment number of the current .ready file can be compared with the
segment number of the last .ready file found at archiver to detect if this
file is
created out of order. A directory scan can be forced if required.

I have incorporated these changes in patch v11.

> While this may be an extremely rare problem in practice, archiving
> something after the next checkpoint completes seems better than never
> archiving it at all. IMO this isn't an area where there is much space
> to take risks.

An alternate approach could be to force a directory scan at checkpoint to
break the infinite wait for a .ready file which is being missed due to the
fact that it is created out of order. This will make sure that the file
gets archived within the checkpoint boundaries.

Thoughts?

Please find attached patch v11.

Thanks,
Dipesh

Attachment Content-Type Size
v11-0001-mitigate-directory-scan-for-WAL-archiver.patch text/x-patch 13.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-08-25 11:32:01 Re: Failure of subscription tests with topminnow
Previous Message Ajin Cherian 2021-08-25 09:53:01 Re: Failure of subscription tests with topminnow