Re: Statistics Import and Export

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-03-28 06:32:09
Message-ID: f4023093c6f94f74e4fbc81f16f81f9ae648075e.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

Comparing the current patch set to your advice below:

On Tue, 2023-12-26 at 14:19 -0500, Tom Lane wrote:
> I had things set up with simple functions, which
> pg_dump would invoke by writing more or less
>
>         SELECT pg_catalog.load_statistics(....);
>
> This has a number of advantages, not least of which is that an
> extension
> could plausibly add compatible functions to older versions.

Check.

>   The trick,
> as you say, is to figure out what the argument lists ought to be.
> Unfortunately I recall few details of what I wrote for Salesforce,
> but I think I had it broken down in a way where there was a separate
> function call occurring for each pg_statistic "slot", thus roughly
>
> load_statistics(table regclass, attname text, stakind int, stavalue
> ...);

The problem with basing the function on pg_statistic directly is that
it can only be exported by the superuser.

The current patches instead base it on the pg_stats view, which already
does the privilege checking. Technically, information about which
stakinds go in which slots is lost, but I don't think that's a problem
as long as the stats make it in, right? It's also more user-friendly to
have nice names for the function arguments. The only downside I see is
that it's slightly asymmetric: exporting from pg_stats and importing
into pg_statistic.

I do have some concerns about letting non-superusers import their own
statistics: how robust is the rest of the code to handle malformed
stats once they make it into pg_statistic? Corey has addressed that
with basic input validation, so I think it's fine, but perhaps I'm
missing something.

> As mentioned already, we'd also need some sort of
> version identifier, and we'd expect the load_statistics() functions
> to be able to transform the data if the old version used a different
> representation.

You mean a version argument to the function, which would appear in the
exported stats data? That's not in the current patch set.

It's relying on the new version of pg_dump understanding the old
statistics data, and dumping it out in a form that the new server will
understand.

>   I agree with the idea that an explicit representation
> of the source table attribute's type would be wise, too.

That's not in the current patch set, either.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-03-28 06:32:58 Re: BUG: deadlock between autovacuum worker and client backend during removal of orphan temp tables with sequences
Previous Message Alexander Korotkov 2024-03-28 06:23:28 Re: [HACKERS] make async slave to wait for lsn to be replayed