Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Liudmila Mantrova <l(dot)mantrova(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, vladimirlesk(at)yandex-team(dot)ru, dsarafan(at)yandex-team(dot)ru
Subject: Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Date: 2020-01-31 23:16:22
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020-01-24 08:50, Michael Paquier wrote:
> On Wed, Jan 22, 2020 at 12:55:30AM +0300, Alexander Korotkov wrote:
>> On Sun, Jan 19, 2020 at 1:24 PM Michael Paquier <michael(at)paquier(dot)xyz>
>> wrote:
>>> +static int
>>> +RestoreArchivedWALFile(const char *path, const char *xlogfname,
>>> + off_t expectedSize, const char
>>> *restoreCommand)
>>> Not a fan of putting that to pg_rewind/parsexlog.c. It has nothing
>>> to
>>> do with WAL parsing, and it seems to me that we could have an
>>> argument
>>> for making this available as a frontend-only API in src/common/.
>> I've put it into src/common/fe_archive.c.
> This split makes sense. You have forgotten to update
> @pgcommonfrontendfiles in for the MSVC build. Still, I
> can see that we have a lot of duplication between the code of the
> frontend and the backend, which is annoying.. The execution part is
> tricky to split because the backend has pre- and post- callbacks, the
> interruption handling is not the same and there is the problem of
> elog() calls, with elevel that can be changed depending on the backend
> configuration. However, I can see one portion of the code which is
> fully duplicated and could be refactored out: the construction of the
> command to execute, by having in input the restore_command string and
> the arguments that we expect to replace in the '%' markers which are
> part of the command.

OK, I agree that duplication of this part directly is annoying. I did a
refactoring and moved this restore_command building code into
common/archive. Separate file was needed, since fe_archive is
frontend-only, so I cannot put universal code there. I do not know is
there any better place for it.

> If NULL is specified for one of those values,
> then the construction routine returns NULL, synonym of failure. And
> the result of the routine is the built command. I think that you
> would need an extra argument to give space for the error message
> generated in the event of an error when building the command.

I did it in a bit different way, but the overall logic is exactly as you
proposed, I hope.
Also I have checked win/linux build in the similar to cfbot CI setup and
now everything runs well.

> +++ b/src/include/common/fe_archive.h
> +#ifndef ARCHIVE_H
> +#define ARCHIVE_H
> This should be FE_ARCHIVE_H.


> - while ((c = getopt_long(argc, argv, "D:nNPR", long_options,
> &option_index)) != -1)
> + while ((c = getopt_long(argc, argv, "D:nNPRC:c", long_options,
> &option_index)) != -1)
> This still includes 'C', but that should not be the case.
> + <application>pg_rewind</application> with the
> <literal>-c</literal> or
> + <literal>-C</literal> option to automatically retrieve them from
> the WAL
> [...]
> + <term><option>-C <replaceable
> class="parameter">restore_command</replaceable></option></term>
> + <term><option>--target-restore-command=<replaceable
> class="parameter">restore_command</replaceable></option></term>
> [...]
> + available, try re-running <application>pg_rewind</application>
> with
> + the <option>-c</option> or <option>-C</option> option to search
> + for the missing files in the WAL archive.
> Three places in the docs need to be cleaned up.

I have fixed all the above, thanks.

> Do we really need to test the new "archive" mode as well for
> 003_extrafiles and 002_databases? I would imagine that only 001_basic
> is enough.

We do not check any unique code paths with it, so I do it only for
001_basic now. I have left it as a test mode, since it will be easy to
turn this on for any other test in the future if needed and it fits well module.

> +# Moves WAL files to the temporary location and returns
> restore_command
> +# to get them back.
> +sub move_wal
> +{
> + my ($tmp_dir, $master_pgdata) = @_;
> + my $wal_archive_path = "$tmp_dir/master_wal_archive";
> + my $wal_path = "$master_pgdata/pg_wal";
> + my $wal_dir;
> + my $restore_command;
> I think that this routine is wrongly designed. First, in order to
> copy the contents from one path to another, we have
> RecursiveCopy::copy_path, and there is a long history about making
> it safe for the TAP tests. Second, there is in
> PostgresNode::enable_restoring already a code path doing the
> decision-making on how building restore_command. We should keep this
> code path unique.

Yeah, thank you for pointing me to the RecursiveCopy::copypath and
especially PostgresNode::enable_restoring, I have modified test to use
them instead. The copypath does not work with existing destination
directories and does not preserve initial permissions, so I had to do
some dirty work to achieve what we need in the test and keep it simple
in the same time. However, I think that using these unified routines is
much better for a future support.

New version is attached. Do you have any other comments or objections?


Attachment Content-Type Size
v14-0001-pg_rewind-Add-options-to-restore-WAL-files.patch text/x-diff 26.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-02-01 00:42:46 Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Previous Message Cleysson Lima 2020-01-31 22:34:18 Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'