Quick Links

Re: analyze-in-stages post upgrade questions

From:	Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To:	"Zechman, Derek S" <Derek(dot)S(dot)Zechman(at)snapon(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject:	Re: analyze-in-stages post upgrade questions
Date:	2025-06-27 15:31:08
Message-ID:	6add2a9a-7cf2-4d1b-8f3e-2e26a7ebe883@aklaver.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

On 6/27/25 06:35, Zechman, Derek S wrote:
> We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
> performed the analyze-in-stages post upgrade. It has been noticed that
> some plans changed to use hash joins instead of nested loops. Further
> investigation found it was because the parent table of partitioned
> tables did not have stats. After running an ANALYZE on the parent
> tables we got similar plan an execution times as before.
>
> I have two questions
>
> 1 - Why does analyze-in-stages not analyze the parent tables?
>
> 2 – What happens if we do not run analyze-in-stages post upgrade and
> just run an analyze?

It is spelled out in the docs:

https://www.postgresql.org/docs/current/pgupgrade.html

Emphasis added

"Using vacuumdb --all --analyze-only can efficiently generate such
statistics, and the use of --jobs can speed it up. Option
--analyze-in-stages can be used to generate **minimal statistics**
quickly. If vacuum_cost_delay is set to a non-zero value, this can be
overridden to speed up statistics generation using PGOPTIONS, e.g.,
PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."

and from here:

https://www.postgresql.org/docs/current/app-vacuumdb.html

"--analyze-in-stages

Only calculate statistics for use by the optimizer (no vacuum),
like --analyze-only. Run three stages of analyze; the first stage uses
the lowest possible statistics target (see default_statistics_target) to
produce usable statistics faster, and subsequent stages build the full
statistics.

This option is only useful to analyze a database that currently has
no statistics or has wholly incorrect ones, such as if it is newly
populated from a restored dump or by pg_upgrade. Be aware that running
with this option in a database with existing statistics may cause the
query optimizer choices to become transiently worse due to the low
statistics targets of the early stages.
"

>
> Thanks,
>
> Sean
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

analyze-in-stages post upgrade questions at 2025-06-27 13:35:27 from Zechman, Derek S

Responses

Re: analyze-in-stages post upgrade questions at 2025-06-27 21:39:17 from Laurenz Albe

Browse pgsql-general by date

	From	Date	Subject
Next Message	raphi	2025-06-27 17:00:36	Re: password rules
Previous Message	Greg Sabino Mullane	2025-06-27 15:15:26	Re: Figure out nullability of query parameters

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2025-06-27 15:33:45	Re: Extend COPY FROM with HEADER <integer> to skip multiple lines
Previous Message	shihao zhong	2025-06-27 15:26:54	Re: Fixes inconsistent behavior in vacuum when it processes multiple relations