Re: Race condition in recovery?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in recovery?
Date: 2021-05-24 05:04:36
Message-ID: CAFiTN-uvv+VxZjBcGG7w3noMG-cSZdCqDUVyx=h+GNidHynGyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 24, 2021 at 10:17 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Sun, 23 May 2021 21:37:58 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> > On Sun, May 23, 2021 at 2:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Sat, May 22, 2021 at 8:33 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > I have created a tap test based on Robert's test.sh script. It
> > reproduces the issue. I am new with perl so this still needs some
> > cleanup/improvement, but at least it shows the idea.
>
> I'm not sure I'm following the discussion here, however, if we were
> trying to reproduce Dilip's case using base backup, we would need such
> a broken archive command if using pg_basebackup witn -Xnone. Becuase
> the current version of pg_basebackup waits for all required WAL
> segments to be archived when connecting to a standby with -Xnone.

Right, that's the reason if you see my patch I have dynamically
generated such archive command which skips everything other than the
history file
see below snippet from my patch, where I am generating a skip_cp
command and then I am using that as an archive command.

==
+# Prepare a alternative archive command to skip WAL files
+my $script = "#!/usr/bin/perl \n
+use File::Copy; \n
+my (\$source, \$target) = \(at)ARGV; \n
+if (\$source =~ /history/) \n
+{ \n
+ copy(\$source, \$target); \n
+}";
+
+open my $fh, '>', "skip_cp";
+print {$fh} $script;
===

I
> don't bother reconfirming the version that fix took place, but just
> using -X stream instead of "none" we successfully miss the first
> segment of the new timeline in the upstream archive, though we need to
> erase pg_wal in the backup. Either the broken archive command or
> erasing pg_wal of the cascade is required to the behavior to occur.
>
> The attached is how it looks like.

I will test this and let you know. Thanks!

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-24 05:05:07 Re: Move pg_attribute.attcompression to earlier in struct for reduced size?
Previous Message Michael Paquier 2021-05-24 05:03:12 Re: locking [user] catalog tables vs 2pc vs logical rep