Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, rjuju123(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2022-03-15 06:09:26
Message-ID: 20220315.150926.894035582561069628.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 14 Mar 2022 17:37:40 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> On Mon, Mar 7, 2022 at 3:39 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > So the new framework has been dropped in this version.
> > The second test is removed as it is irrelevant to this bug.
> >
> > In this version the patch is a single file that contains the test.
>
> The status of this patch in the CommitFest was set to "Waiting for
> Author." Since a new patch has been submitted since that status was
> set, I have changed it to "Needs Review." Since this is now in its
> 15th CommitFest, we really should get it fixed; that's kind of
> ridiculous. (I am as much to blame as anyone.) It does seem to be a
> legitimate bug.
>
> A few questions about the patch:

Thanks for looking this!

> 1. Why is it OK to just skip the operation without making it up later?

Does "it" mean removal of directories? It is not okay, but in the
first place it is out-of-scope of this patch to fix that. The patch
leaves the existing code alone. This patch just has recovery ignore
invalid accesses into eventually removed objects.

Maybe, I don't understand you question..

> 2. Why not instead change the code so that the operation can succeed,
> by creating the prerequisite parent directories? Do we not have enough
> information for that? I'm not saying that we definitely should do it
> that way rather than this way, but I think we do take that approach in
> some cases.

It is proposed first by Paul Guo [1] then changed so that it ignores
failed directory creations in the very early stage in this thread.
After that, it gets conscious of recovery consistency by managing
invalid-access list.

[1] https://www.postgresql.org/message-id/flat/20210327142316.GA32517%40alvherre.pgsql#a557bd47207a446ce206879676e0140a

I think there was no strong reason for the current shape but I
personally rather like the remembering-invalid-access way because it
doesn't dirty the data directory and it is consistent with how we
treat missing heap pages.

I tried a slightly tweaked version (attached) of the first version and
confirmed that it works for the current test script. It doesn't check
recovery consistency but otherwise that way also seems fine.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-03-15 06:13:17 Re: Skipping logical replication transactions on subscriber side
Previous Message Yura Sokolov 2022-03-15 05:07:39 Re: BufferAlloc: don't take two simultaneous locks