Re: Improve documentation for pg_upgrade, standbys and rsync

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: Re: Improve documentation for pg_upgrade, standbys and rsync
Date: 2021-07-16 13:17:44
Message-ID: 20210716131744.GA20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Greetings,

* Laurenz Albe (laurenz(dot)albe(at)cybertec(dot)at) wrote:
> On Wed, 2021-05-19 at 10:31 -0400, Stephen Frost wrote:
> > * Laurenz Albe (laurenz(dot)albe(at)cybertec(dot)at) wrote:
> > > I revently tried to upgrade a standby following the documentation,
> > > but I found it hard to understand, and it took me several tries to
> > > get it right. This is of course owing to my lack of expertise with
> > > rsync, but I think the documentation and examples could be clearer.
> > >
> > > I think it would be a good idea to recommend the --relative option
> > > of rsync.
> >
> > An additional thing that we should really be mentioning is to tell
> > people to go in and TRUNCATE all of their UNLOGGED tables before going
> > through this process, otherwise the rsync will end up spending a bunch
> > of time copying the files for UNLOGGED relations which you really don't
> > want.
>
> I have thought about that some more, and I am not certain that we should
> unconditionally recommend that. Perhaps the pain of rebuilding the
> unlogged table on the primary would be worse than rsyncing it to the
> standby.

I disagree entirely. The reason to even consider using this approach is
to minimize the time required to get things back online and there's no
question that having the unlogged tables get rsync'd across would
increase the time required.

> The documentation already mentions
>
> "Unfortunately, rsync needlessly copies files associated with temporary
> and unlogged tables because these files don't normally exist on standby
> servers."
>
> I'd say that is good enough, and people can draw their conclusions from
> that.

I disagree. Instead, we should have explicit steps included which
detail how to find and truncate unlogged tables and what to do to remove
or exclude temporary files once the server is shut down.

> Attached is a new patch with an added reminder to create "standby.signal",
> as mentioned in [1].
>
> Yours,
> Laurenz Albe
>
> [1]: https://www.postgr.es/m/1A5A1B6E-7BB6-47EB-8443-40222B769404@iris.washington.edu

> From 47b685b700548af06ab08673187bdd1df7236464 Mon Sep 17 00:00:00 2001
> From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
> Date: Fri, 16 Jul 2021 07:45:22 +0200
> Subject: [PATCH] Improve doc for pg_upgrade and standby servers
>
> Recommend using the --relative option of rsync for clarity
> and adapt the code samples accordingly.
> Using relative paths makes clearer what is meant by "current
> directory" and "remote_dir".

I'm not really convinced that this is actually a positive change, though
I don't know that it's really a negative one either. In general, I
prefer fully qualified paths to try and make things very clear about
what's happening, but this is also a bit of an odd case due to hard
links, etc.

> Add a reminder that "standby.signal" needs to be created.

This makes sense to include, certainly, but it should be an explicit
step, not just a "don't forget" note at the end. I'm not really sure
why we talk about "log shipping" either..? Wouldn't it make more sense
to have something like:

g. Configure standby servers

Review the prior configuration of the standby servers and set up the
same configuration in the newly rsync'd directory.

1. touch /path/to/replica/standby.signal
2. Configure restore_command to pull from WAL archive
3. For streaming replicas, configure primary_conninfo

Probably back-patched all the way, with adjustments made for the pre-12
releases accordingly.

Thanks,

Stephen

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tom Lane 2021-07-16 13:35:31 Re: user-defined function in Perl or Tcl ???
Previous Message PG Doc comments form 2021-07-16 09:44:22 user-defined function in Perl or Tcl ???

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-07-16 13:40:11 Re: Add option --drop-cascade for pg_dump/restore
Previous Message James Coleman 2021-07-16 13:14:33 Re: [PATCH] Use optimized single-datum tuplesort in ExecSort