Re: Speeding up pg_upgrade

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speeding up pg_upgrade
Date: 2018-01-05 19:00:14
Message-ID: CAMkU=1x-e+maqefhM1yMeSiJ8J9Z+SJHgW7c9bqo3E3JMG4iJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 7, 2017 at 11:28 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades. ... we discussed speeding up pg_upgrade.
> >
> > There are clusters that take a long time to dump the schema from the old
> > cluster
>
> Maybe it isn't representative of a typical case, but I can offer a data
> point:
>
> For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
> (for which ~90% is on a ZFS tablespace with compression). We have what's
> traditionally considered to be an excessive number of child tables, which
> works
> okay since planning time is unimportant to us for the report queries which
> hit
> them. Some of the tables are wide (historically up to 1600 columns).
> Some of
> those have default values on nearly every column, and pg_attrdef was large
> (>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).
> Since
> something similar is run by pg_upgrade, I worked around the issue for now
> by
> dropping defaults on the historic children in advance of upgrades (at some
> point I'll figure out what I have to do to allow DROPing DEFAULTs). It's
> not
> the first time we've seen an issue with larger number of children*columns.
>

This is probably worth fixing independent of other ways of speeding up
pg_upgrade.

It spends most of its time making the column names unique while de-parsing
the DEFAULT clause for each column. But I don't think it ever outputs the
column name which results from that deparsing, and since there is only one
table involved, the names should already be unique anyway, unless I am
missing something.

The time seems to be quadratic in number of columns if all columns have
defaults, or proportional to the product of number of columns in table and
the number of columns with defaults.

The CREATE TABLE has a similar problem upon restoring the dump.

Cheers,

Jeff

Attachment Content-Type Size
pg_dump_default.sh application/x-sh 230 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-01-05 19:11:30 pgsql: pg_upgrade: simplify code layout in a few places
Previous Message Alvaro Herrera 2018-01-05 18:53:34 Re: User defined data types in Logical Replication