Re: .ready and .done files considered harmful

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: dipesh(dot)pandit(at)gmail(dot)com
Cc: bossartn(at)amazon(dot)com, robertmhaas(at)gmail(dot)com, jeevan(dot)ladhe(at)enterprisedb(dot)com, sfrost(at)snowman(dot)net, andres(at)anarazel(dot)de, hannuk(at)google(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: .ready and .done files considered harmful
Date: 2021-09-07 08:42:08
Message-ID: 20210907.174208.938167028563322823.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 3 Sep 2021 18:31:46 +0530, Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com> wrote in
> Hi,
>
> Thanks for the feedback.
>
> > Which approach do you think we should use? I think we have decent
> > patches for both approaches at this point, so perhaps we should see if
> > we can get some additional feedback from the community on which one we
> > should pursue further.
>
> In my opinion both the approaches have benefits over current implementation.
> I think in keep-trying-the-next-file approach we have handled all rare and
> specific
> scenarios which requires us to force a directory scan to archive the
> desired files.
> In addition to this with the recent change to force a directory scan at
> checkpoint
> we can avoid an infinite wait for a file which is still being missed out
> despite
> handling the special scenarios. It is also more efficient in extreme
> scenarios
> as discussed in this thread. However, multiple-files-per-readdir approach
> is
> cleaner with resilience of current implementation.
>
> I agree that we should decide on which approach to pursue further based on
> additional feedback from the community.

I was thinking that the multple-files approch would work efficiently
but the the patch still runs directory scans every 64 files. As
Robert mentioned it is still O(N^2). I'm not sure the reason for the
limit, but if it were to lower memory consumption or the cost to sort,
we can resolve that issue by taking trying-the-next approach ignoring
the case of having many gaps (discussed below). If it were to cause
voluntary checking of out-of-order files, almost the same can be
achieved by running directory scans every 64 files in the
trying-the-next approach (and we would suffer O(N^2) again). On the
other hand, if archiving is delayed by several segments, the
multiple-files method might reduce the cost to scan the status
directory but it won't matter since the directory contains only
several files. (I think that it might be better that we don't go to
trying-the-next path if we found only several files by running a
directory scan.) The multiple-files approach reduces the number of
directory scans if there were many gaps in the WAL file
sequence. Alghouth theoretically the last max_backend(+alpha?)
segemnts could be written out-of-order, but I suppose we only have
gaps only among the several latest files in reality. I'm not sure,
though..

In short, the trying-the-next approach seems to me to be the way to
go, for the reason that it is simpler but it can cover the possible
failures by almost the same measures with the muliple-files approach.

> > The problem I see with this is that pgarch_archiveXlog() might end up
> > failing. If it does, we won't retry archiving the file until we do a
> > directory scan. I think we could try to avoid forcing a directory
> > scan outside of these failure cases and archiver startup, but I'm not
> > sure it's really worth it. When pgarch_readyXlog() returns false, it
> > most likely means that there are no .ready files present, so I'm not
> > sure we are gaining a whole lot by avoiding a directory scan in that
> > case. I guess it might help a bit if there are a ton of .done files,
> > though.
>
> Yes, I think it will be useful when we have a bunch of .done files and
> the frequency of .ready files is such that the archiver goes to wait
> state before the next WAL file is ready for archival.
>
> > I agree, but it should probably be something like DEBUG3 instead of
> > LOG.
>
> I will update it in the next patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-09-07 08:57:02 Re: automatically generating node support functions
Previous Message Drouvot, Bertrand 2021-09-07 08:32:06 Re: [UNVERIFIED SENDER] Re: [BUG] Failed Assertion in ReorderBufferChangeMemoryUpdate()