Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-05-07 03:43:50
Message-ID: CADkLM=cQgghMJOS8EcAVBwRO4s1dUVtxGZv5gLPfZkQ1nL1gzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> Next up for question is how to handle --statistics-only or an equivalent.
> The option would be mutually exclusive with --schema-only and --data-only,
> and it would be mildly incongruous if it didn't have a short option like
> the others, so I'm suggested -P for Probablity / Percentile / ρ:
> correlation / etc.
>
> One wrinkle with having three mutually exclusive options instead of two is
> that the existing code was able to assume that one of the options being
> true meant that we could bail out of certain dumpXYZ() functions, and now
> those tests have to compare against two, which makes me think we should add
> three new DumpOptions that are the non-exclusive positives (yesSchema,
> yesData, yesStats) and set those in addition to the schemaOnly, dataOnly,
> and statsOnly flags. Thoughts?
>

v21 attached.

0001 is the same.

0002 is a preparatory change to pg_dump introducing
DumpOption/RestoreOption variables dumpSchema and dumpData. The current
code makes heavy use of the fact that schemaOnly and dataOnly are mutually
exclusive and logically opposite. That will not be the case when
statisticsOnly is introduced, so I decided to add the new variables whose
value is entirely derivative of the existing command flags, but resolves
the complexities of those interactions in one spot, as those complexities
are about to jump with the new options.

0003 is the statistics changes to pg_dump, adding the options -X /
--statistics-only, and the derivative boolean statisticsOnly. The -P option
is already used by pg_restore, so instead I chose -X because of the passing
resemblance to Chi as in the chi-square statistics test makes it vaguely
statistics-ish. If someone has a better letter, I'm listening.

With that change, people should be able to use pg_dump -X --table=foo to
dump existing stats for a table and its dependent indexes, and then tweak
those calls to do tuning work. Have fun with it. If this becomes a common
use-case then it may make sense to get functions to fetch
relation/attribute stats for a given relation, either as a formed SQL
statement or as the parameter values.

Attachment Content-Type Size
v21-0001-Create-pg_set_relation_stats-pg_set_attribute_st.patch text/x-patch 109.4 KB
v21-0002-Add-derivative-flags-dumpSchema-dumpData.patch text/x-patch 18.5 KB
v21-0003-Enable-dumping-of-table-index-stats-in-pg_dump.patch text/x-patch 33.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-05-07 03:48:52 Re: 2024-05-09 release announcement draft
Previous Message Michael Paquier 2024-05-07 03:42:55 Re: [PATCH] json_lex_string: don't overread on bad UTF8