Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Liudmila Mantrova <l(dot)mantrova(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, vladimirlesk(at)yandex-team(dot)ru, dsarafan(at)yandex-team(dot)ru
Subject: Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Date: 2020-01-24 05:50:54
Message-ID: 20200124055054.GB1581@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 22, 2020 at 12:55:30AM +0300, Alexander Korotkov wrote:
> Thank you for your review!
> The revised patch is attached.

Thanks for the new version.

> On Sun, Jan 19, 2020 at 1:24 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> +static int
>> +RestoreArchivedWALFile(const char *path, const char *xlogfname,
>> + off_t expectedSize, const char *restoreCommand)
>> Not a fan of putting that to pg_rewind/parsexlog.c. It has nothing to
>> do with WAL parsing, and it seems to me that we could have an argument
>> for making this available as a frontend-only API in src/common/.
>
> I've put it into src/common/fe_archive.c.

This split makes sense. You have forgotten to update
@pgcommonfrontendfiles in Mkvcbuild.pm for the MSVC build. Still, I
can see that we have a lot of duplication between the code of the
frontend and the backend, which is annoying.. The execution part is
tricky to split because the backend has pre- and post- callbacks, the
interruption handling is not the same and there is the problem of
elog() calls, with elevel that can be changed depending on the backend
configuration. However, I can see one portion of the code which is
fully duplicated and could be refactored out: the construction of the
command to execute, by having in input the restore_command string and
the arguments that we expect to replace in the '%' markers which are
part of the command. If NULL is specified for one of those values,
then the construction routine returns NULL, synonym of failure. And
the result of the routine is the built command. I think that you
would need an extra argument to give space for the error message
generated in the event of an error when building the command.

>> Do we actually need --target-restore-command at all? It seems to me
>> that we have all we need with --restore-target-wal, and that's not
>> really instinctive to pass down a command via another command..
>
> I was going to argue that --restore-target-wal is useful. But I
> didn't find convincing example for that. Naturally, if one uses
> restore_command during pg_rewind then why the same restore_command
> can't be used in new standby. So, I've removed --restore-target-wal
> argument. If we will find convincing use case we can return it any
> moment.

Okay. Let's consider it if it makes sense. You can always go through
this restriction by first setting restore_command in the target
cluster, and then run pg_rewind. And I would think that someone is
going to use the same restore_command even after restarting the
rewound target cluster.

>> + # Add restore_command to postgresql.conf of target cluster.
>> + open(my $conf_fd, ">>", $master_conf_path) or die;
>> + print $conf_fd "\nrestore_command='$restore_command'";
>> + close $conf_fd;
>> We have append_conf() for that.
>
> Sure, thank you for pointing!

+++ b/src/include/common/fe_archive.h
+#ifndef ARCHIVE_H
+#define ARCHIVE_H
This should be FE_ARCHIVE_H.

- while ((c = getopt_long(argc, argv, "D:nNPR", long_options,
&option_index)) != -1)
+ while ((c = getopt_long(argc, argv, "D:nNPRC:c", long_options,
&option_index)) != -1)
This still includes 'C', but that should not be the case.

+ <application>pg_rewind</application> with the
<literal>-c</literal> or
+ <literal>-C</literal> option to automatically retrieve them from
the WAL
[...]
+ <term><option>-C <replaceable
class="parameter">restore_command</replaceable></option></term>
+ <term><option>--target-restore-command=<replaceable
class="parameter">restore_command</replaceable></option></term>
[...]
+ available, try re-running <application>pg_rewind</application> with
+ the <option>-c</option> or <option>-C</option> option to search
+ for the missing files in the WAL archive.
Three places in the docs need to be cleaned up.

Do we really need to test the new "archive" mode as well for
003_extrafiles and 002_databases? I would imagine that only 001_basic
is enough.

+# Moves WAL files to the temporary location and returns restore_command
+# to get them back.
+sub move_wal
+{
+ my ($tmp_dir, $master_pgdata) = @_;
+ my $wal_archive_path = "$tmp_dir/master_wal_archive";
+ my $wal_path = "$master_pgdata/pg_wal";
+ my $wal_dir;
+ my $restore_command;
I think that this routine is wrongly designed. First, in order to
copy the contents from one path to another, we have
RecursiveCopy::copy_path, and there is a long history about making
it safe for the TAP tests. Second, there is in
PostgresNode::enable_restoring already a code path doing the
decision-making on how building restore_command. We should keep this
code path unique.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-01-24 05:55:34 Re: BUG #16109: Postgres planning time is high across version (Expose buffer usage during planning in EXPLAIN)
Previous Message Amit Kapila 2020-01-24 05:21:49 Re: Error message inconsistency