Re: .ready and .done files considered harmful

From: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-09-14 14:20:43
Message-ID: CAN1g5_GCTCtDpVqXdvE0wn9_ZJqRdgMCApUp_pACfwvFzQ6k0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the feedback.

> The latest post on this thread contained a link to this one, and it
> made me want to rewind to this point in the discussion. Suppose we
> have the following alternative scenario:
>
> Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist
> yet. The following directory scan ends up finding 12.ready. Just
> before we update the PgArch state, XLogArchiveNotify() is called and
> creates 11.ready. However, pg_readyXlog() has already decided to
> return WAL segment 12 and update the state to look for 13 next.
>
> Now, if I'm not mistaken, using <= doesn't help at all.
>
> In my opinion, the problem here is that the natural way to ask "is
> this file being archived out of order?" is to ask yourself "is the
> file that I'm marking as ready for archiving now the one that
> immediately follows the last one I marked as ready for archiving?" and
> then invert the result. That is, if I last marked 10 as ready, and now
> I'm marking 11 as ready, then it's in order, but if I'm now marking
> anything else whatsoever, then it's out of order. But that's not what
> this does. Instead of comparing what it's doing now to what it did
> last, it compares what it did now to what the archiver did last.

I agree that when we are creating a .ready file we should compare
the current .ready file with the last .ready file to check if this file is
created out of order. We can store the state of the last .ready file
in shared memory and compare it with the current .ready file. I
believe that archiver specific shared memory area can be used
to store the state of the last .ready file unless I am missing
something and this needs to be stored in a separate shared
memory area.

With this change, we have the flexibility to move the current archiver
state out of shared memory and keep it local to archiver. I have
incorporated these changes and updated a new patch.

> > And it's really not obvious that that's correct. I think that the
> > above argument actually demonstrates a flaw in the logic, but even if
> > not, or even if it's too small a flaw to be a problem in practice, it
> > seems a lot harder to reason about.
>
> I certainly agree that it's harder to reason about. If we were to go
> the keep-trying-the-next-file route, we could probably minimize a lot
> of the handling for these rare cases by banking on the "fallback"
> directory scans. Provided we believe these situations are extremely
> rare, some extra delay for an archive every once in a while might be
> acceptable.

+1. We are forcing a directory scan at the checkpoint and it will make sure
that any missing file gets archived within the checkpoint boundaries.

Please find the attached patch.

Thanks,
Dipesh

Attachment Content-Type Size
v4-0001-keep-trying-the-next-file-approach.patch text/x-patch 14.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-09-14 14:30:45 Re: refactoring basebackup.c
Previous Message Tom Lane 2021-09-14 14:11:25 Re: Physical replication from x86_64 to ARM64