Re: Race condition in recovery?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in recovery?
Date: 2021-05-04 12:11:06
Message-ID: CAFiTN-tO+OxiiNiM8oE=+10xhiMZkGrUZ-L1bn1SRChjzVnn7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 2, 2021 at 3:14 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> =====
> ee994272ca50f70b53074f0febaec97e28f83c4e
> Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> 2013-01-03 14:11:58
> Committer: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> 2013-01-03 14:11:58
>
> Delay reading timeline history file until it's fetched from master.
>
> Streaming replication can fetch any missing timeline history files from the
> master, but recovery would read the timeline history file for the target
> timeline before reading the checkpoint record, and before walreceiver has
> had a chance to fetch it from the master. Delay reading it, and the sanity
> checks involving timeline history, until after reading the checkpoint
> record.
>
> There is at least one scenario where this makes a difference: if you take
> a base backup from a standby server right after a timeline switch, the
> WAL segment containing the initial checkpoint record will begin with an
> older timeline ID. Without the timeline history file, recovering that file
> will fail as the older timeline ID is not recognized to be an ancestor of
> the target timeline. If you try to recover from such a backup, using only
> streaming replication to fetch the WAL, this patch is required for that to
> work.
> =====

The above commit avoid initializing the expectedTLEs from the
recoveryTargetTLI as shown in below hunk from this commit.

@@ -5279,49 +5299,6 @@ StartupXLOG(void)
*/
readRecoveryCommandFile();

- /* Now we can determine the list of expected TLIs */
- expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-

I think the fix for the problem will be that, after reading/validating
the checkpoint record, we can free the current value of expectedTLEs
and reinitialize it based on the recoveryTargetTLI as shown in the
attached patch?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-After-reading-checkpoint-record-fix-expectedTLEs-to-.patch text/x-patch 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-05-04 12:37:22 Re: WIP: WAL prefetch (another approach)
Previous Message Thomas Munro 2021-05-04 11:12:17 Re: A test for replay of regression tests