Re: Race condition in recovery?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Race condition in recovery?
Date: 2021-05-27 07:17:30
Message-ID: CAFiTN-s86DUXtOvcV2ECrNKyjcMRp3PYYTP5U7o+XXQsJuHr-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 27, 2021 at 12:09 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 27 May 2021 11:44:47 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> > Maybe we can somehow achieve that without a broken archive command,
> > but I am not sure how it is enough to just delete WAL from pg_wal? I
> > mean my original case was that
> > 1. Got the new history file from the archive but did not get the WAL
> > file yet which contains the checkpoint after TL switch
> > 2. So the standby2 try to stream using new primary using old TL and
> > set the wrong TL in expectedTLEs
> >
> > But if you are not doing anything to stop archiving WAL files or to
> > guarantee that WAL has come to archive and you deleted those then I am
> > not sure how we are reproducing the original problem.
>
> Thanks for the reply!
>
> We're writing at the very beginning of the switching segment at the
> promotion time. So it is guaranteed that the first segment of the
> newer timline won't be archived until the rest almost 16MB in the
> segment is consumed or someone explicitly causes a segment switch
> (including archive timeout).

I agree

> > BTW, I have also tested your script and I found below log, which shows
> > that standby2 is successfully able to select the timeline2 so it is
> > not reproducing the issue. Am I missing something?
>
> standby_2? My last one 026_timeline_issue_2.pl doesn't use that name
> and uses "standby_1 and "cascade". In the ealier ones, standby_4 and
> 5 (or 3 and 4 in the later versions) are used in ths additional tests.
>
> So I think it shold be something different?

Yeah, I tested with your patch where you had a different test case,
with "026_timeline_issue_2.pl", I am able to reproduce the issue.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-05-27 07:28:48 Re: Parallel Inserts in CREATE TABLE AS
Previous Message tsunakawa.takay@fujitsu.com 2021-05-27 07:16:11 RE: Parallel Inserts in CREATE TABLE AS