Re: Statistics Import and Export

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2023-12-26 19:19:04
Message-ID: 2090613.1703618344@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I think we need a robust API to handle two cases:

> * changes in how we store statistics
> * changes in how how data type values are represented in the statistics

> We have had such changes in the past, and I think these two issues are
> what have prevented import/export of statistics up to this point.
> Developing an API that doesn't cleanly handle these will cause long-term
> pain.

Agreed.

> In summary, I think we need an SQL-level command for this.

I think a SQL command is an actively bad idea. It'll just add development
and maintenance overhead that we don't need. When I worked on this topic
years ago at Salesforce, I had things set up with simple functions, which
pg_dump would invoke by writing more or less

SELECT pg_catalog.load_statistics(....);

This has a number of advantages, not least of which is that an extension
could plausibly add compatible functions to older versions. The trick,
as you say, is to figure out what the argument lists ought to be.
Unfortunately I recall few details of what I wrote for Salesforce,
but I think I had it broken down in a way where there was a separate
function call occurring for each pg_statistic "slot", thus roughly

load_statistics(table regclass, attname text, stakind int, stavalue ...);

I might have had a separate load_statistics_xxx function for each
stakind, which would ease the issue of deciding what the datatype
of "stavalue" is. As mentioned already, we'd also need some sort of
version identifier, and we'd expect the load_statistics() functions
to be able to transform the data if the old version used a different
representation. I agree with the idea that an explicit representation
of the source table attribute's type would be wise, too.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2023-12-26 19:35:37 Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Previous Message Tom Lane 2023-12-26 19:02:33 Two small bugs in guc.c