Re: Race condition in recovery?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: dilipbalaut(at)gmail(dot)com
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, hlinnaka(at)iki(dot)fi
Subject: Re: Race condition in recovery?
Date: 2021-05-27 00:49:14
Message-ID: 20210527.094914.566269705259375842.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 26 May 2021 22:08:32 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> On Wed, 26 May 2021 at 10:06 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> > On Wed, May 26, 2021 at 12:26 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> > wrote:
> > > I will check if there is any timing dependency in the test case.
> >
> > There is. I explained it in the second part of my email, which you may
> > have failed to notice.
>
>
> Sorry, my bad. I got your point now. I will change the test.

I didn't noticed that but that is actually possible to happen.

By the way I'm having a hard time understanding what was happening on
this thread.

In the very early in this thread I posted a test script that exactly
reproduces Dilip's case by starting from two standbys based on his
explanation. But *we* didn't understand what the original commit
ee994272ca intended and I understood that we wanted to know it.

So in the mail [1] and [2] I tried to describe what's going on around
the two issues. Although I haven't have a response to [2], can I
think that we clarified the intention of ee994272ca? And may I think
that we decided that we don't add a test for the commit?

Then it seems to me that Robert refound how to reproduce Dilip's case
using basebackup instead of using two standbys. It is using a broken
archive_command with pg_basebackup -Xnone and I showed that the same
resulting state is available by pg_basebackup -Xstream/fetch clearing
pg_wal directory of the resulting backup including an explanation of
why.

*I* think that it is better to avoid to have the archive_command since
it seems to me that just unlinking some files seems simpler tha having
the broken archive_command. However, since Robert ignored it, I guess
that Robert thinks that the broken archive_command is better than
that.

It my understanding above about the current status of this thread is
right?

FWIW, regarding to the name of the test script, putting aside what it
actually does, I proposed to place it as a part or
004_timeline_switch.pl because this issue is related to timeline
switching.

[1] 20210521(dot)112105(dot)27166595366072396(dot)horikyota(dot)ntt(at)gmail(dot)com
[2] https://www.postgresql.org/message-id/20210524.113402.1922481024406047229.horikyota.ntt@gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-27 01:00:16 Re: Move pg_attribute.attcompression to earlier in struct for reduced size?
Previous Message Tom Lane 2021-05-27 00:35:46 Re: Move pg_attribute.attcompression to earlier in struct for reduced size?