Re: pg_rewind docs correction

From: James Coleman <jtc331(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_rewind docs correction
Date: 2020-03-09 13:26:17
Message-ID: CAAaqYe_DfddPrKs++Kz9A13eXtNXjCtbr_amr=AEd2UZsXFikw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 9, 2020 at 2:59 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Sun, Mar 08, 2020 at 05:13:21PM -0400, James Coleman wrote:
> > I've added the information about how the backup label control file is
> > written, and updated the How It Works steps to refer to that separately
> > from restart.
> >
> > Additionally the How It Works is updated to include WAL segments and new
> > relation files in the list of files copied wholesale, since that was
> > previously stated but somewhat contradicted there.
>
> - The result is equivalent to replacing the target data directory with
> the
> - source one. Only changed blocks from relation files are copied;
> - all other files are copied in full, including configuration files. The
> - advantage of <application>pg_rewind</application> over taking a new
> base backup, or
> - tools like <application>rsync</application>, is that
> <application>pg_rewind</application> does
> - not require reading through unchanged blocks in the cluster. This makes
> - it a lot faster when the database is large and only a small
> - fraction of blocks differ between the clusters.
> + After a successful rewind and subsequent WAL replay, the target data
> + directory is equivalent to a base backup of the source data directory.
> While
> + only changed blocks from existing relation files are copied; all other
> files
> + are copied in full, including new relation files, configuration files,
> and WAL
> + segments. The advantage of <application>pg_rewind</application> over
> taking a
>
> The first sentence you are adding refers to "subsequent WAL replay".
> However, this paragraph emphasizes with the state of the target
> cluster after running pg_rewind but *before* make the target cluster
> start recovery. So shouldn't you just remove the part "and subsequent
> WAL replay" from your first new sentence?
>

I'd originally typed this:
I'm not sure I follow. After pg_rewind but before replay the directory is
*not* equivalent to a base backup. I don't see how paragraph is clearly
limited to describing what pg_rewind does. While the 2nd sentence is about
pg_rewind steps specifically, the paragraph (even in the original) goes on
to compare it to a base backup so we're talking about the operation in
totality not just the one tool.

But I realized while typing it that I was probably missing something of
what you were getting at: is the hangup on calling out the WAL replay that
a base backup (or rsync even) *also* requires WAL reply to reach a
consistent state? I hadn't thought of that while writing this initially, so
I've updated the patch to eliminate that part but also to make the analogy
to base backups more direct, since it's helpful in understanding what
result the tool is trying to accomplish and how it differs.

In the same paragraph, I think that you should remove the "While" from
> "While only changed blocks", as the second part of the sentence refers
> to the other files, WAl segments, etc.
>

Fixed as part of the above.

> The second paragraph of the docs regarding timeline lookup is
> unchanged, which is fine.
>
> - When the target server is started for the first time after running
> - <application>pg_rewind</application>, it will go into recovery mode
> and replay all
> - WAL generated in the source server after the point of divergence.
> + After running <application>pg_rewind</application> the data directory
> is
> + not immediately in a consistent state. However
> + <application>pg_rewind</application> configures the control file so
> that when
> + the target server is started again it will enter recovery mode and
> replay all
> + WAL generated in the source server after the point of divergence.
>
> The second part of the third paragraph is not changed, and the
> modification you are doing here is about the control file. I am
> still unconvinced that this is a good change, because mentioning the
> control file would be actually more adapted to the part "How it
> works", where you are adding details about the backup_label file, and
> already include details about the minimum consistency LSN itself
> stored in the control file.
>

I've removed the control file reference and instead continued the analogy
to base backups.

> + <para>
> + Because <application>pg_rewind</application> copies configuration
> files
> + entirely from the source, correcting recovery configuration options
> before
> + restarting the server is necessary if you intend to re-introduce the
> target
> + as a replica of the source. If you restart the server after the rewind
> + operation has finished but without configuring recovery, the target
> will
> + again diverge from the primary.
> + </para>
>
> True that this is not outlined enough.
>

Thanks.

> + The relation files are now to their state at the last checkpoint
> completed
> + prior to the point at which the WAL timelines of the source and
> target
> + diverged plus the current state on the source of any blocks changed
> on the
> + target after that divergence.
>
> "Relation files are now in a state equivalent to the moment of the
> last completed checkpoint prior to the point.."?
>

Updated.

> - <filename>pg_stat_tmp/</filename>, and
> - <filename>pg_subtrans/</filename> are omitted from the data copied
> - from the source cluster. Any file or directory beginning with
> - <filename>pgsql_tmp</filename> is omitted, as well as are
> + <filename>pg_stat_tmp/</filename>, and
> <filename>pg_subtrans/</filename>
> + are omitted from the data copied from the source cluster. The files
>
> This is just reorganizing an existing list, why?
>

The grammar seemed a bit awkward to me, so while I was already reworking
this paragraph I tried to clean that up a bit.

> + Create a backup label file to begin WAL replay at the checkpoint
> created
> + at failover and a minimum consistency LSN using
> + <literal>pg_current_wal_insert_lsn()</literal>, when using a live
> source
> + and the last checkpoint LSN, when using a stopped source.
>
> Now would be the moment to mention the control file.
>

I made that more explicit here, and also referenced the filenames directly
(and with tags).

> > I realized I didn't previously add this to the CF; since it's not a new
> > patch I've added it to the current CF, but if this is incorrect please
> let
> > me know.
>
> The last CF of Postgres 13 began at the beginning of February :(
>

Still ongoing, correct? I guess I mentally think of them as being only one
month, but I guess that's not actually true. Regardless I'm not sure what
policy is for patches that have been in flight in hackers for a while but
just missed being added to the CF app.

Thanks,
James

Attachment Content-Type Size
v4-0001-Improve-pg_rewind-explanation-and-warnings.patch text/x-patch 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2020-03-09 13:34:39 Re: [Proposal] Global temporary tables
Previous Message David Steele 2020-03-09 13:00:34 Re: POC: rational number type (fractions)