Re: Statistics Import and Export

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2023-12-27 12:08:47
Message-ID: 4e49c72e-f8a5-de1a-4ef4-36529bee1d65@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/26/23 20:19, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> I think we need a robust API to handle two cases:
>
>> * changes in how we store statistics
>> * changes in how how data type values are represented in the statistics
>
>> We have had such changes in the past, and I think these two issues are
>> what have prevented import/export of statistics up to this point.
>> Developing an API that doesn't cleanly handle these will cause long-term
>> pain.
>
> Agreed.
>

I agree the format is important - we don't want to end up with a format
that's cumbersome or inconvenient to use. But I don't think the proposed
format is somewhat bad in those respects - it mostly reflects how we
store statistics and if I was designing a format for humans, it might
look a bit differently. But that's not the goal here, IMHO.

I don't quite understand the two cases above. Why should this affect how
we store statistics? Surely, making the statistics easy to use for the
optimizer is much more important than occasional export/import.

>> In summary, I think we need an SQL-level command for this.
>
> I think a SQL command is an actively bad idea. It'll just add development
> and maintenance overhead that we don't need. When I worked on this topic
> years ago at Salesforce, I had things set up with simple functions, which
> pg_dump would invoke by writing more or less
>
> SELECT pg_catalog.load_statistics(....);
>
> This has a number of advantages, not least of which is that an extension
> could plausibly add compatible functions to older versions. The trick,
> as you say, is to figure out what the argument lists ought to be.
> Unfortunately I recall few details of what I wrote for Salesforce,
> but I think I had it broken down in a way where there was a separate
> function call occurring for each pg_statistic "slot", thus roughly
>
> load_statistics(table regclass, attname text, stakind int, stavalue ...);
>
> I might have had a separate load_statistics_xxx function for each
> stakind, which would ease the issue of deciding what the datatype
> of "stavalue" is. As mentioned already, we'd also need some sort of
> version identifier, and we'd expect the load_statistics() functions
> to be able to transform the data if the old version used a different
> representation. I agree with the idea that an explicit representation
> of the source table attribute's type would be wise, too.
>

Yeah, this is pretty much what I meant by "functional" interface. But if
I said maybe the format implemented by the patch is maybe too close to
how we store the statistics, then this has exactly the same issue. And
it has other issues too, I think - it breaks down the stats into
multiple function calls, so ensuring the sanity/correctness of whole
sets of statistics gets much harder, I think.

I'm not sure about the extension idea. Yes, we could have an extension
providing such functions, but do we have any precedent of making
pg_upgrade dependent on an external extension? I'd much rather have
something built-in that just works, especially if we intend to make it
the default behavior (which I think should be our aim here).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2023-12-27 12:21:08 Re: Failed assertion in joininfo.c, remove_join_clause_from_rels
Previous Message Richard Guo 2023-12-27 12:03:57 Re: Failed assertion in joininfo.c, remove_join_clause_from_rels