Re: .ready and .done files considered harmful

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bossartn(at)amazon(dot)com
Cc: robertmhaas(at)gmail(dot)com, dipesh(dot)pandit(at)gmail(dot)com, jeevan(dot)ladhe(at)enterprisedb(dot)com, sfrost(at)snowman(dot)net, andres(at)anarazel(dot)de, hannuk(at)google(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: .ready and .done files considered harmful
Date: 2021-08-06 04:39:58
Message-ID: 20210806.133958.962850705445445360.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 6 Aug 2021 02:34:24 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
> On 8/5/21, 6:26 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > It works the current way always at the first iteration of
> > pgarch_ArchiveCopyLoop() becuse in the last iteration of
> > pgarch_ArchiveCopyLoop(), pgarch_readyXlog() erases the last
> > anticipated segment. The shortcut works only when
> > pgarch_ArchiveCopyLoop archives more than once successive segments at
> > once. If the anticipated next segment found to be missing a .ready
> > file while archiving multiple files, pgarch_readyXLog falls back to
> > the regular way.
> >
> > So I don't see the danger to happen perhaps you are considering.
>
> I think my concern is that there's no guarantee that we will ever do
> another directory scan. A server that's generating a lot of WAL could
> theoretically keep us in the next-anticipated-log code path
> indefinitely.

Theoretically possible. Supposing that .ready may be created
out-of-order (for the following reason, as a possibility), when once
the fast path bailed out then the fallback path finds that the second
oldest file has .ready, the succeeding fast path continues running
leaving the oldest file.

> > In the first place, .ready are added while holding WALWriteLock in
> > XLogWrite, and while removing old segments after a checkpoint (which
> > happens while recovery). Assuming that no one manually remove .ready
> > files on an active server, the former is the sole place doing that. So
> > I don't see a chance that .ready files are created out-of-order way.
>
> Perhaps a more convincing example is when XLogArchiveNotify() fails.
> AFAICT this can fail without ERROR-ing, in which case the server can
> continue writing WAL and creating .ready files for later segments. At
> some point, the checkpointer process will call RemoveOldXlogFiles()
> and try to create the missing .ready file.

Mmm. Assuming that could happen, a history file gets cursed to lose a
chance to be archived forever once that disaster falls onto it. Apart
from this patch, maybe we need a measure to notify the history files
that are once missed a chance.

Assuming that all such forgotten files would be finally re-marked as
.ready anywhere, they can be re-found by archiver by explicitly
triggering the fallback path. Currently the trigger fires implicitly
by checking shared timeline movement, but by causing the trigger by,
for example by a signal as mentioned in a nearby message, that
behavior would be easily to implement.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2021-08-06 05:20:30 Re: RFC: Improve CPU cache locality of syscache searches
Previous Message Amit Kapila 2021-08-06 04:39:37 Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION