Re: pg_rewind docs correction

From: James Coleman <jtc331(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_rewind docs correction
Date: 2019-09-15 14:36:04
Message-ID: CAAaqYe-qa2NzuEt6-b53Z0SOK9+pXD1vGA4sppPMsxvNohpDSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 15, 2019 at 10:25 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Sat, Sep 14, 2019 at 07:00:54PM -0400, James Coleman wrote:
> > Updated (plus some additional wordsmithing).
>
> + The rewind operation is not expected to result in a consistent data
> + directory state either internally to the node or with respect to the rest
> + of the cluster. Instead the resulting data directory will only be consistent
> + after WAL replay has completed to at least the LSN at which changed blocks
> + copied from the source were originally written on the source.
>
> That's not necessarily true. pg_rewind enforces in the control file
> of the target the minimum consistency LSN to be
> pg_current_wal_insert_lsn() when using a live source or the last
> checkpoint LSN for a stopped source, so while that sounds true from
> the point of view of all the blocks copied, the control file may still
> cause a complain that the target recovering has not reached its
> consistent point even if all the blocks are already at a position
> not-so-far from what has been registered in the control file.

I could just say "after WAL replay has completed to a consistent state"?

> + the point at which the WAL timelines of the source and target diverged plus
> + the current state on the source of any blocks changed on the target after
> + that divergence. While only changed blocks from existing relation files are
>
> And here we could mention that all the blocks copied from the source
> are the ones which are found in the WAL records of the target until
> the end of WAL of its timeline. Still, that's basically what is
> mentioned in the first part of "How It Works", which explains things
> better. I honestly don't really see that all this paragraph is an
> improvement over the simplicity of the original when it comes to
> understand the global idea of what pg_rewind does.

The problem with the original is that while simple, it's actually
incorrect in that simplicity. Pg_rewind does *not* result in the data
directory on the target matching the data directory on the source.

> + <para>
> + Because <application>pg_rewind</application> copies configuration files
> + entirely from the source, correcting recovery configuration options before
> + restarting the server is necessary if you intend to re-introduce the target
> + as a replica of the source. If you restart the server after the rewind
> + operation has finished but without configuring recovery, the target will
> + again diverge from the primary.
> + </para>
>
> No objections regarding that part. Now it seems to me that we had
> better apply that to the last part of "How it works" instead? I kind
> of agree that the last paragraph could provide more details regarding
> the risks of overwriting the wanted configuration. The existing docs
> also mention that pg_rewind only creates a backup_label file to start
> recovery, perhaps we could mention up to which point recovery happens
> in this section? There is a bit more here than just "apply the WAL".

I'll look to see if there's a better place to put this.

James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Virendra Negi 2019-09-15 14:36:24 Re: Primary keepalive message not appearing in Logical Streaming Replication
Previous Message Virendra Negi 2019-09-15 14:35:51 Re: Primary keepalive message not appearing in Logical Streaming Replication