Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Liudmila Mantrova <l(dot)mantrova(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, vladimirlesk(at)yandex-team(dot)ru, dsarafan(at)yandex-team(dot)ru
Subject: Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Date: 2020-03-05 06:24:33
Message-ID: 20200305062433.GT2593@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 04, 2020 at 08:14:20PM +0300, Alexey Kondratov wrote:
> On 04.03.2020 10:45, Michael Paquier wrote:
> - *        Functions for finding and validating executable files
> + *        Functions for finding and validating from executables files
>
> There is probably something missing here. Finding and validating what? And
> 'executables files' does not seem to be correct as well.

Oops. I was tweaking this part at some point of my review, but
discarded the change afterwards.

> +        # First, remove all the content in the archive directory,
> +        # as RecursiveCopy::copypath does not support copying to
> +        # existing directories.
>
> I think that 'remove all the content' is not completely correct in this
> case. We are simply removing archive directory. There is no content there
> yet, so 'First, remove archive directory...' should be fine.

Indeed, this can be improved. I still need to do an extra pass on
this patch set.

>> - I did not actually get why you don't check for a missing command
>> when using wait_result_is_any_signal. In this case I'd think that it
>> is better to exit immediately as follow-up calls would just fail.
>
> Believe me or not, but I put 'false' there intentionally. The idea was that
> if the reason is a signal, then maybe user tired of waiting and killed that
> restore_command process theirself or something like that, so it is better to
> exit immediately. If it was a missing command, then there is no hurry, so we
> can go further and complain that attempt of recovering WAL segment has
> failed.
>
> Actually, I guess that there is no big difference if we include missing
> command here or not. There is no complicated logic further compared to real
> recovery process in Postgres, where we cannot simply return false in that
> case.

On the contrary, it seems to me that the difference is very important.
Imagine for example a frontend tool which calls RestoreArchivedWALFile
in a loop, and that this one fails because the command called is
missing. This tool would keep looping for nothing. So checking for a
missing command and leaving immediately would be more helpful for the
user. Can you think about scenarios where it would make sense to be
able to loop in this case instead of failing?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-03-05 06:35:48 Re: WAL usage calculation patch
Previous Message Masahiko Sawada 2020-03-05 06:21:49 Re: Identifying user-created objects