Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

From: a(dot)kondratov(at)postgrespro(dot)ru
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, vladimirlesk(at)yandex-team(dot)ru, dsarafan(at)yandex-team(dot)ru
Subject: Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Date: 2019-01-21 20:50:33
Message-ID: ecb60bb6eb9c7fc22e6002fa80e1dbe6@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andrey,

Thank you for the review! I have updated the patch according to your
comments and remarks. Please, find new version attached.

On 2019-01-07 12:12, Andrey Borodin wrote:
>
> Here are some my notes:
> 1. RestoreArchivedWAL() shares a lot of code with
> RestoreArchivedFile(). Is it possible\viable to refactor and extract
> common part?
>

I am not sure, but I guess that differences in errors reporting and
points 2-3 are enough to leave new RestoreArchivedWAL() apart. There
are not many common parts left.

>
> 2. IMV pg_rewind with %r restore_command should fail. %r is designed
> to clean archive from WALs, nothing should be deleted in case of
> fetching WALs for rewind. Last restartpoint has no meaning during
> rewind. Or does it? If so, let's comment about it.
>

Yes, during rewind we need the last common checkpoint, not restart
point.
Taking into account that Michael pointed me to this place too and I
cannot
propose a real-life example of restore_command with %r to use in
pg_rewind,
I decided to add an exception in such a case. So now pg_rewind will fail
with error.

>
> 3. RestoreArchivedFile() checks for signals, is it done by pg_rewind
> elsewhere?
>

There is a comment part inside RestoreArchivedFile():

* However, if the failure was due to any sort of signal, it's best to
* punt and abort recovery. (If we "return false" here, upper levels
will
* assume that recovery is complete and start up the database!)

In other words, there is a possibility to start up the database assuming
that recovery went well, while actually it was terminated by signal. It
happens since failure is expected during the recovery, so some kind of
ambiguity occurs, which we trying to solve checking for termination
signals.

In contrast, we are looking in the archive for each missing WAL file
during
pg_rewind and if we failed to restore it by any means rewind will fail
indifferent of was it due to the termination signal or file is actually
missing in the archive. Thus, there no ambiguity occurs and we do not
need
to check for signals here.

That is how I understand it. Probably someone can explain why I am
wrong.

>
> 4. No documentation is updated
>

I have updated docs in order to reflect the new functionality as well.

>
> 5. -R takes precedence over -r without notes. Shouldn't we complain?
> Or may be we should take one from config, iif nothing found use -R?
>

I do not think it is worth of additional complexity and we could expect,
that end-users know, what they want to use–either -r or -R–so I added
an error message due to the conflicting options.

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
0001-pg_rewind-options-to-use-restore_command-v2.1.patch text/x-diff 66.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-01-21 20:56:24 Re: Protect syscache from bloating with negative cache entries
Previous Message Tom Lane 2019-01-21 20:45:04 Re: Thread-unsafe coding in ecpg