Re: Race condition in recovery?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in recovery?
Date: 2021-05-17 04:39:50
Message-ID: CAFiTN-tT7YhuqFN-79OKrYOsu+DggU-gD7CHv6VGJnaiCaW2bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 17, 2021 at 8:50 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> Before the commit expectedTLEs is always initialized with just one
> entry for the TLI of the last checkpoint record.

Right

> (1) If XLogFileReadAnyTLI() found the segment but no history file
> found, that is, using the dummy TLE-list, expectedTLEs is initialized
> with the dummy one-entry list. So there's no behavioral change in this
> aspect.

Yeah, you are right.

> (2) If we didn't find the segment for the checkpoint record, it starts
> replication and fetches history files and WAL records then revisits
> XLogFileReadAnyTLI. Now we have both the history file and segments,
> it successfully reads the recood. The difference of expectedTLEs made
> by the patch is having just one entry or the all entries for the past.

Correct.

> Assuming that we keep expectedTLEs synced with recoveryTargetTLI,
> rescanLatestTimeLine updates the list properly so no need to worry
> about the future. So the issue would be in the past timelines. After
> reading the checkpoint record, if we need to rewind to the previous
> timeline for the REDO point, the dummy list is inconvenient.
>
> So there is a possibility that the patch fixed the case (2), where the
> standby doesn't have both the segment for the checkpoint record and
> the history file for the checkpoint, and the REDO point is on the last
> TLI. If it is correct, the patch still fails for the case (1), that
> is, the issue raised here. Anyway it would be useless (and rahter
> harmful) to initialize expectedTLEs based on receiveTLI there.
>
> So my resul there is:
>
> The commit fixed the case (2)

Yes, by maintaining the entire history instead of one entry if history
was missing.

> The fix caused the issue for the case (1).

Basically, before this commit expectedTLEs and recoveryTargetTLI were
in always in sync which this patch broke.

> The proposed fix fixes the case (1), caused by the commit.

Right, I agree with the fix. So fix should be just to change that one
line and initialize expectedTLEs from recoveryTargetTLI

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2021-05-17 04:45:09 Re: Typo in README.barrier
Previous Message Yugo NAGATA 2021-05-17 04:36:46 Re: Implementing Incremental View Maintenance