Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-02-20 07:24:52
Message-ID: CADkLM=c7sMxeOGMmV6MtsjFkNnji6XLfj2bTgy04f4vWWhv6PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 15, 2024 at 4:09 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:

> Posting v5 updates of pg_import_rel_stats() and pg_import_ext_stats(),
> which address many of the concerns listed earlier.
>
> Leaving the export/import scripts off for the time being, as they haven't
> changed and the next likely change is to fold them into pg_dump.
>
>
>
v6 posted below.

Changes:

- Additional documentation about the overall process.
- Rewording of SGML docs.
- removed a fair number of columns from the transformation queries.
- enabled require_match_oids in extended statistics, but I'm having my
doubts about the value of that.
- moved stats extraction functions to an fe_utils file stats_export.c that
will be used by both pg_export_stats and pg_dump.
- pg_export_stats now generates SQL statements rather than a tsv, and has
boolean flags to set the validate and require_match_oids parameters in the
calls to pg_import_(rel|ext)_stats.
- pg_import_stats is gone, as importing can now be done with psql.

I'm hoping to get feedback on a few areas.

1. The checks for matching oids. On the one hand, in a binary upgrade
situation, we would of course want the oid of the relation to match what
was exported, as well as all of the atttypids of the attributes to match
the type ids exported, same for collations, etc. However, the binary
upgrade is the one place where there are absolutely no middle steps that
could have altered either the stats jsons or the source tables. Given that
and that oid simply will never match in any situation other than a binary
upgrade, it may be best to discard those checks.

2. The checks for relnames matching, and typenames of attributes matching
(they are already matched by name, so the column order can change without
the import missing a beat) seem so necessary that there shouldn't be an
option to enable/disable them. But if that's true, then the initial
relation parameter becomes somewhat unnecessary, and anyone using these
functions for tuning or FDW purposes could easily transform the JSON using
SQL to put in the proper relname.

3. The data integrity validation functions may belong in a separate
function rather than being a parameter on the existing import functions.

4. Lastly, pg_dump. Each relation object and extended statistics object
will have a statistics import statement. From my limited experience with
pg_dump, it seems like we would add an additional Stmt variable (statsStmt)
to the TOC entry for each object created, and the restore process would
check the value of --with-statistics and in cases where the statistics flag
was set AND a stats import statement exists, then execute that stats
statement immediately after the creation of the object. This assumes that
there is no case where additional attributes are added to a relation after
it's initial CREATE statement. Indexes are independent relations in this
regard.

Attachment Content-Type Size
v6-0001-Create-pg_import_rel_stats.patch text/x-patch 91.7 KB
v6-0002-Create-pg_import_ext_stats.patch text/x-patch 80.8 KB
v6-0003-Add-common-functions-for-exporting-statistics.patch text/x-patch 37.6 KB
v6-0004-Add-pg_export_stats.patch text/x-patch 10.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-02-20 07:45:00 Re: Switching XLog source from archive to streaming when primary available
Previous Message vignesh C 2024-02-20 07:05:06 Re: confirmed flush lsn seems to be move backward in certain error cases