Re: Speeding up pg_upgrade

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speeding up pg_upgrade
Date: 2017-12-09 13:45:14
Message-ID: 20171209134514.GA4628@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce,

* Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> On Fri, Dec 8, 2017 at 12:26:55PM -0500, Stephen Frost wrote:
> > * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > > I think the big problem with two-stage pg_upgrade is that the user steps
> > > are more complex, so what percentage of users are going use the
> > > two-stage method. The bad news is that only a small percentage of users
> > > who will benefit from it will use it, and some who will not benefit it
> > > will use it. Also, this is going to require significant server changes,
> > > which have to be maintained.
> >
> > This is where I think we need to be considering a higher-level tool that
> > makes it easier to run pg_upgrade and which could handle these multiple
> > stages.
> >
> > > I think we need some statistics on how many users are going to benefit
> > > from this, and how are users suppose to figure out if they will benefit
> > > from it?
> >
> > If the complexity issue is addressed, then wouldn't all users who use
> > pg_upgrade in link mode benefit from this..? Or are you thinking we
>
> The instructions in the docs are going to be more complex. We don't
> have any planned way to make the two-stage approach the same complexity
> as the single-stage approach.

A lot of the complexity in the current approach is all the different
steps that we take in the pg_upgrade process, which is largely driven by
the lack of a higher-level tool to handle all those steps.

The distributions have tried to address that by having their own tools
and scripts to simplify the process. Debian-based systems have
pg_upgradecluster which is a single command that handles everything.
The RPM-based systems have the 'setup' script that's pretty similar.

If pg_upgrade had a two-stage process, those scripts would be updated to
take advantage of that, I expect, and users would be largely shielded
from the increase in complexity.

One of the things that those scripts are able to take advantage of are
configuration files which they have (or hard-coded knowledge) about
where everything is on the system, what clusters exist, etc. That's
information that pg_upgrade itself doesn't have and therefore we can't
really automate in pg_upgrade. One option would be to build support for
pg_upgrade to have a config file that has all of that information and
then make pg_upgrade (or another tool) able to manage the process.

Given that the major distributions already have that, I'm not entirely
sure that it's something we should be trying to duplicate since I don't
think we'd be able to do so in a way that integrates to the same level
that the distribution-specific tools do, but perhaps we should try, or
maybe work with the distributions to have a way for them to generate a
config file with the info we need..? Just brain storming here.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-12-09 13:45:36 Re: pgsql: Prohibit identity columns on typed tables and partitions
Previous Message Daniel Vérité 2017-12-09 13:36:43 Re: proposal: alternative psql commands quit and exit