Re: pg_upgrade: Pass -j down to vacuumdb

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "fabriziomello(at)gmail(dot)com" <fabriziomello(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: pg_upgrade: Pass -j down to vacuumdb
Date: 2019-03-28 01:43:44
Message-ID: CAMkU=1zcOfThON4p00VTWEO1xdY6R8Xbh9KSt9pZkLHq0gdE=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 26, 2019 at 7:28 AM Peter Eisentraut <
peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 2019-03-25 22:57, Tom Lane wrote:
> > + fprintf(script, "echo %sYou may wish to add --jobs=N for parallel
> analyzing.%s\n",
> > + ECHO_QUOTE, ECHO_QUOTE);
>
> But then you get that information after you have already started the
> script.
>

True, but the same goes for all the other information there, and it sleeps
to let you break out of it. And I make it a habit to glance through any
scripts someone suggests that I run, so would notice the embedded advice
without running it at all.

I don't find any information about this analyze business on the
> pg_upgrade reference page. Maybe a discussion there could explain the
> different paths better than making the output script extra complicated.
>
> Essentially: If you want a slow and gentle analyze, use the supplied
> script. If you want a fast analyze, use vacuumdb, perhaps with an
> appropriate --jobs option. Note that pg_upgrade --jobs and vacuumdb
> --jobs are resource-bound in different ways, so the same value might not
> be appropriate for both.
>
>
To me, analyze-in-stages is not about gentleness at all. For example, it
does nothing to move vacuum_cost_delay away from its default of 0. Rather,
it is about slamming the bare minimum statistics in there as fast as
possible, so your database doesn't keel over from horrible query plans on
even simple queries as soon as you reopen. I want the database to survive
long enough for the more complete statistics to be gathered. If you have
quickly accumulated max_connection processes all running horrible query
plans that never finish, your database might as well still be closed for
all the good it does the users. And all the load generated by those is
going to make the critical ANALYZE all that much slower.

At first blush I thought it was obvious that you would not want to run
analyze-in-stages in parallel. But after thinking about it some more and
reflecting on experience doing some troublesome upgrades, I would reverse
that and say it is now obvious you do want at least the first stage of
analyze-in-stages, and probably the first two, to run in parallel. That is
not currently an option it supports, so we can't really recommend it in the
script or the docs.

But we could at least adopt the more straightforward patch, suggest that if
they don't want analyze-in-stages they should consider doing the big-bang
analyze in parallel.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-03-28 02:18:48 Re: Log a sample of transactions
Previous Message Amit Langote 2019-03-28 01:34:43 Re: Ordered Partitioned Table Scans