Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2023-12-13 10:26:04
Message-ID: CADkLM=frTx5DocOpyhRHL4r_LKvP315cNA-5+0UVqGM=7GKd9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Yeah, that was the simplest output function possible, it didn't seem
>
worth it to implement something more advanced. pg_mcv_list_items() is
> more convenient for most needs, but it's quite far from the on-disk
> representation.
>

I was able to make it work.

>
> That's actually a good question - how closely should the exported data
> be to the on-disk format? I'd say we should keep it abstract, not tied
> to the details of the on-disk format (which might easily change between
> versions).
>

For the most part, I chose the exported data json types and formats in a
way that was the most accommodating to cstring input functions. So, while
so many of the statistic values are obviously only ever integers/floats,
those get stored as a numeric data type which lacks direct
numeric->int/float4/float8 functions (though we could certainly create
them, and I'm not against that), casting them to text lets us leverage
pg_strtoint16, etc.

>
> I'm a bit confused about the JSON schema used in pg_statistic_export
> view, though. It simply serializes stakinds, stavalues, stanumbers into
> arrays ... which works, but why not to use the JSON nesting? I mean,
> there could be a nested document for histogram, MCV, ... with just the
> correct fields.
>
> {
> ...
> histogram : { stavalues: [...] },
> mcv : { stavalues: [...], stanumbers: [...] },
> ...
> }
>

That's a very good question. I went with this format because it was fairly
straightforward to code in SQL using existing JSON/JSONB functions, and
that's what we will need if we want to export statistics on any server
currently in existence. I'm certainly not locked in with the current
format, and if it can be shown how to transform the data into a superior
format, I'd happily do so.

and so on. Also, what does TRIVIAL stand for?
>

It's currently serving double-duty for "there are no stats in this slot"
and the situations where the stats computation could draw no conclusions
about the data.

Attached is v3 of this patch. Key features are:

* Handles regular pg_statistic stats for any relation type.
* Handles extended statistics.
* Export views pg_statistic_export and pg_statistic_ext_export to allow
inspection of existing stats and saving those values for later use.
* Import functions pg_import_rel_stats() and pg_import_ext_stats() which
take Oids as input. This is intentional to allow stats from one object to
be imported into another object.
* User scripts pg_export_stats and pg_import stats, which offer a primitive
way to serialize all the statistics of one database and import them into
another.
* Has regression test coverage for both with a variety of data types.
* Passes my own manual test of extracting all of the stats from a v15
version of the popular "dvdrental" example database, as well as some
additional extended statistics objects, and importing them into a
development database.
* Import operations never touch the heap of any relation outside of
pg_catalog. As such, this should be significantly faster than even the most
cursory analyze operation, and therefore should be useful in upgrade
situations, allowing the database to work with "good enough" stats more
quickly, while still allowing for regular autovacuum to recalculate the
stats "for real" at some later point.

The relation statistics code was adapted from similar features in
analyze.c, but is now done in a query context. As before, the
rowcount/pagecount values are updated on pg_class in a non-transactional
fashion to avoid table bloat, while the updates to pg_statistic are
pg_statistic_ext_data are done transactionally.

The existing statistics _store() functions were leveraged wherever
practical, so much so that the extended statistics import is mostly just
adapting the existing _build() functions into _import() functions which
pull their values from JSON rather than computing the statistics.

Current concerns are:

1. I had to code a special-case exception for MCELEM stats on array data
types, so that the array_in() call uses the element type rather than the
array type. I had assumed that the existing exmaine_attribute() functions
would have properly derived the typoid for that column, but it appears to
not be the case, and I'm clearly missing how the existing code gets it
right.
2. This hasn't been tested with external custom datatypes, but if they have
a custom typanalyze function things should be ok.
3. While I think I have cataloged all of the schema-structural changes to
pg_statistic[_ext[_data]] since version 10, I may have missed a case where
the schema stayed the same, but the values are interpreted differently.
4. I don't yet have a complete vision for how these tools will be used by
pg_upgrade and pg_dump/restore, the places where these will provide the
biggest win for users.

Attachment Content-Type Size
v3-0002-Add-system-view-pg_statistic_export.patch text/x-patch 8.1 KB
v3-0001-Additional-internal-jsonb-access-functions.patch text/x-patch 2.4 KB
v3-0003-Add-pg_import_rel_stats.patch text/x-patch 41.8 KB
v3-0004-Add-pg_export_stats-pg_import_stats.patch text/x-patch 20.6 KB
v3-0005-Add-system-view-pg_statistic_ext_export.patch text/x-patch 11.7 KB
v3-0006-Create-create_stat_ext_entry-from-fetch_statentri.patch text/x-patch 5.5 KB
v3-0008-Allow-explicit-nulls-in-container-lookups.patch text/x-patch 2.6 KB
v3-0007-Add-pg_import_ext_stats.patch text/x-patch 31.6 KB
v3-0009-Enable-pg_export_stats-pg_import_stats-to-use-ext.patch text/x-patch 31.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2023-12-13 10:39:22 Re: trying again to get incremental backup
Previous Message Peter Smith 2023-12-13 10:22:54 Re: Synchronizing slots from primary to standby