Re: Race condition in recovery?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in recovery?
Date: 2021-06-07 14:40:27
Message-ID: CA+Tgmoacaffw8entMvero8T2WT4v8uc4bcM7EJcV1gToRxvxvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 7, 2021 at 12:57 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> Unfortunately no. The backslashes in the binary path need to be
> escaped. (taken from PostgresNode.pm:1008)
>
> > (my $perlbin = $^X) =~ s{\\}{\\\\}g if ($TestLib::windows_os);
> > $node_primary->append_conf(
> > 'postgresql.conf', qq(
> > archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
> > ));
>
> This works for me.

Hmm, OK. Do you think we also need to use perl2host in this case?

> Ugh! Sorry. I meant "The explicit teardowns are useless". That's not
> harmful but it is done by PostgresNode.pm automatically(implicitly)
> and we don't do that in the existing scripts.

OK. I don't think it's a big deal, but we can remove them.

> As I said upthread the relationship between receiveTLI and
> recoveryTargetTLI is not confirmed yet at the point.
> findNewestTimeLine() simply searches for the history file with the
> largest timeline id so the returned there's a case where the timeline
> id that the function returns is not a future of the latest checkpoint
> TLI. I think that the fact that rescanLatestTimeLine() checks the
> relationship is telling us that we need to do the same in the path as
> well.
>
> In my previous proposal, it is done just after the line the patch
> touches but it can be in the if (fetching_ckpt) branch.

I went back and looked at your patch again, now that I understand the
issue better. I believe it's not necessary to do this here, because
StartupXLOG() already contains a check for the same thing:

/*
* If the location of the checkpoint record is not on the expected
* timeline in the history of the requested timeline, we cannot proceed:
* the backup is not part of the history of the requested timeline.
*/
Assert(expectedTLEs); /* was initialized by reading checkpoint
* record */
if (tliOfPointInHistory(checkPointLoc, expectedTLEs) !=
checkPoint.ThisTimeLineID)
...

This code is always run after ReadCheckpointRecord() returns. And I
think that your only concern here is about the case where the
checkpoint record is being fetched, because otherwise expectedTLEs
must already be set.

By the way, I also noticed that your version of the patch contains a
few words which are spelled incorrectly: hearafter, and incosistent.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-06-07 15:14:25 Re: back-port one-line gcc-10+ warning fix to REL_10_STABLE
Previous Message Tom Lane 2021-06-07 14:38:03 Re: Multiple hosts in connection string failed to failover in non-hot standby mode