Re: pg_upgrade: Pass -j down to vacuumdb

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "fabriziomello(at)gmail(dot)com" <fabriziomello(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: pg_upgrade: Pass -j down to vacuumdb
Date: 2019-04-03 21:24:34
Message-ID: 20190403212434.GY17544@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 03, 2019 at 04:42:14PM -0400, Jeff Janes wrote:
> So maybe the first stage could
> be run by pg_upgrade itself, while the new server is still running on a
> linux socket in a private directory.

I think that would take too long. It would be less of an issue if there was
feedback/progress from pg_upgrade during the analyze.

For our upgrades (which typically take ~15min but several customers take up to
~60min), I only analyze base tables (essentially, those which are neither
parents nor children), then start services, then ANALYZE with default stats
target. I would want to avoid delaying services restart for more than another
(say) 5 minutes, and I would want to avoid even that unless there was a
progress report indicating that it's projected to take only a few more minutes.

I just did a test on one of our large-but-not-huge customers. With
stats_target=1, analyzing a 145GB partitioned table looks like it'll take
perhaps an hour; they have ~1TB data, so delaying services during ANALYZE would
nullify the utility of pg_upgrade. I can restore the essential tables from
backup in 15-30 minutes.

It might be fine if pg_upgrade took an option which enabled analyze, perhaps
instead of outputting analyze_new_cluster.sh. But actually, a problem with
*that* is that currently pg_upgrade avoids starting the new cluster. That
seems to be deliberate, since, with --link, that's an irreversible operation:
it's unsafe to start the old cluster afterwards.

Tangent: I have a queued mail from ~15 months ago wherein I proposed adding to
pg_upgrade an option to remove the old data dir (or probably only the files
associated with known relations). I realized at the time that would be pretty
scary without having first verified that the new cluster at least starts. I'm
not sure how good an idea that is, but --startnewcluster would be needed there,
too.

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2019-04-03 21:27:55 Re: allow online change primary_conninfo
Previous Message Tom Lane 2019-04-03 21:17:02 Re: CPU costs of random_zipfian in pgbench