Re: Race condition in recovery?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in recovery?
Date: 2021-05-14 04:59:07
Message-ID: CAFiTN-szz5O-8bCcdtef5OYg4kP-RKL78b+S8AVWoPvpBr2N2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 14, 2021 at 2:37 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> So why does this use recoveryTargetTLI instead of receiveTLI only
> conditionally? Why not do it all the time?
>
> The hard thing about this code is that the assumptions are not very
> clear. If we don't know why something is a certain way, then we might
> break things if we change it. Worse yet, if nobody else knows why it's
> like that either, then who knows what assumptions they might be
> making? It's hard to be sure that any change is safe.
>
> But that being said, we have a clear definition from the comments for
> what expectedTLEs is supposed to contain, and it's only going to end
> up with those contents if we initialize it from recoveryTargetTLI. So
> I am inclined to think that we ought to do that always, and if it
> breaks something, then that's a sign that some other part of the code
> also needs fixing, because apparently that hypothetical other part of
> the code doesn't work if expctedTLEs contains what the comments say
> that it should.
>
> Now maybe that's the wrong idea. But if so, then we're saying that the
> definition of expectedTLEs needs to be changed, and we should update
> the comments with the new definition, whatever it is. A lot of the
> confusion here results from the fact that the code and comments are
> inconsistent and we can't tell whether that's intentional or
> inadvertent. Let's not leave the next person who looks at this code
> wondering the same thing about whatever changes we make.

I am not sure that have you noticed the commit id which changed the
definition of expectedTLEs, Heikki has committed that change so adding
him in the list to know his opinion.

=====
ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> 2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> 2013-01-03 14:11:58

Delay reading timeline history file until it's fetched from master.
.....
Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=====

Part of this commit message says that it will not identify the
recoveryTargetTLI as the ancestor of the checkpoint timeline (without
history file). I did not understand what it is trying to say. Does
it is trying to say that even though the recoveryTargetTLI is the
ancestor of the checkpoint timeline but we can not track that because
we don't have a history file? So to handle this problem change the
definition of expectedTLEs to directly point to the checkpoint
timeline?

Because before this commit, we were directly initializing expectedTLEs
with the history file of recoveryTargetTLI, we were not even waiting
for reading the checkpoint, but under this commit, it is changed.

I am referring to the below code which was deleted by this commit:

========
@@ -5279,49 +5299,6 @@ StartupXLOG(void)
*/
readRecoveryCommandFile();

- /* Now we can determine the list of expected TLIs */
- expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-
- /*
- * If the location of the checkpoint record is not on the expected
- * timeline in the history of the requested timeline, we cannot proceed:
- * the backup is not part of the history of the requested timeline.
- */
- if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
- ControlFile->checkPointCopy.ThisTimeLineID)
- {
=========

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-05-14 05:12:31 Re: Race condition in recovery?
Previous Message Tom Lane 2021-05-14 04:26:23 Re: compute_query_id and pg_stat_statements