Re: pg_upgrade: Pass -j down to vacuumdb

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "fabriziomello(at)gmail(dot)com" <fabriziomello(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: pg_upgrade: Pass -j down to vacuumdb
Date: 2019-04-03 20:42:14
Message-ID: CAMkU=1w2Yuvz+xbCU+42pT2U3GCE0xmkaSDXCgM9CR3bt=5Crw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 29, 2019 at 5:58 AM Peter Eisentraut <
peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 2019-03-28 02:43, Jeff Janes wrote:
> > At first blush I thought it was obvious that you would not want to run
> > analyze-in-stages in parallel. But after thinking about it some more
> > and reflecting on experience doing some troublesome upgrades, I would
> > reverse that and say it is now obvious you do want at least the first
> > stage of analyze-in-stages, and probably the first two, to run in
> > parallel. That is not currently an option it supports, so we can't
> > really recommend it in the script or the docs.
>
> So do you think we should copy down the -j option from pg_upgrade, or
> make some other arrangement?
>

For 12, I think we should not pass it down so that it runs automatically,
and just go with a doc-only patch instead. Where the text written into
the analyze_new_cluster.sh recommending to hit Ctrl-C and do something else
is counted as documentation.

I agree with you that the value of -j that was used for pg_ugrade might not
be suitable for vacuumdb, but anyone who tests things thoroughly enough to
know exactly what value to use for each is not going to be the target
audience of analyze_new_cluster.sh anyway. So I think the -j used in
pg_upgrade can just be written directly into the suggestion, and that would
be good enough for the intended use.

Ideally I think the analyze-in-stages should have the ability to
parallelize the first stage and not the last one, but that is not v12
material.

Even more ideally it should only have two stages, not three. One stage
runs to generate minimally-tolerable stats before the database is opened to
other users, and one runs after it is open (but hopefully before the big
end-of-month reports get run, or whatever the big statistics-sensitive
queries are on your system). But we don't really have a concept of "open
to other users" in the first place, and doing it yourself by juggling
pg_hba files is annoying and error prone. So maybe the first stage could
be run by pg_upgrade itself, while the new server is still running on a
linux socket in a private directory.

The defense of the current three-stage method for analyze-in-stages is that
very few people are likely to know just what the level of
"minimally-tolerable stats" for their system actually are, because upgrades
are rare enough not to learn by experience, and realistic load-generators
are rare enough not to learn from test/QA environments. If the
default_statistics_target=1 stats are enough, then you got them quickly.
If they aren't, at least you didn't waste too much time collecting them
before moving on to the second-stage default_statistics_target=10 and then
the final stage. So the three stages are amortizing over our ignorance.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-04-03 20:47:46 Re: Strange coding in _mdfd_openseg()
Previous Message Thomas Munro 2019-04-03 20:24:49 Re: Strange coding in _mdfd_openseg()