Re: Race condition in recovery?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: dilipbalaut(at)gmail(dot)com
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in recovery?
Date: 2021-05-11 08:11:57
Message-ID: 20210511.171157.600145309913652528.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 10 May 2021 14:27:21 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> On Mon, May 10, 2021 at 2:05 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> > I thought that the reason using receiveTLI instead of
> > recoveryTargetTLI here is that there's a case where receiveTLI is the
> > future of recoveryTarrgetTLI but I haven't successfully had such a
> > situation. If I set recovoryTargetTLI to a TLI that standby doesn't
> > know but primary knows, validateRecoveryParameters immediately
> > complains about that before reaching there. Anyway the attached
> > assumes receiveTLI may be the future of recoveryTargetTLI.
>
> If you see the note in this commit. It says without the timeline
> history file, so does it trying to say that although receiveTLI is the
> ancestor of recovoryTargetTLI, it can not detect that because of the
> absence of the TL.history file ?

Yeah, it reads so for me and it works as described. What I don't
understand is that why the patch uses receiveTLI, not
recovoryTargetTLI to load timeline hisotry in
WaitForWALToBecomeAvailable. The only possible reason is that there
could be a case where receivedTLI is the future of recoveryTargetTLI.
However, AFAICS it's impossible for that case to happen. At
replication start, requsting TLI is that of the last checkpoint, which
is the same to recoveryTargetTLI, or anywhere in exising expectedTLEs
which must be the past of recoveryTargetTLI. That seems to be already
true at the time replication was made possible to follow a timeline
switch (abfd192b1b).

So I was tempted to just load history for recoveryTargetTLI then
confirm that receiveTLI is in the history. Actually that change
doesn't harm any of the recovery TAP tests. It is way simpler than
the last patch. However, I'm not confident that it is right.. ;(

> ee994272ca50f70b53074f0febaec97e28f83c4e
> Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> 2013-01-03 14:11:58
> Committer: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> 2013-01-03 14:11:58
> .....
> Without the timeline history file, recovering that file
> will fail as the older timeline ID is not recognized to be an ancestor of
> the target timeline. If you try to recover from such a backup, using only
> streaming replication to fetch the WAL, this patch is required for that to
> work.
> =====
>
> >
> > I believe the 004_timeline_switch.pl detects your issue. And the
> > attached change fixes it.
>
> I think this fix looks better to me, but I will think more about it
> and give my feedback. Thanks for quickly coming up with the
> reproducible test case.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-05-11 08:22:02 Re: seawasp failing, maybe in glibc allocator
Previous Message Michael Paquier 2021-05-11 08:08:43 Re: pgsql: autovacuum: handle analyze for partitioned tables