| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us |
| Subject: | Re: Extended Statistics set/restore/clear functions. |
| Date: | 2025-10-31 20:22:55 |
| Message-ID: | 439a9947-e6a1-4ea9-84d8-1f843deaf6b1@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 10/23/25 01:46, Michael Paquier wrote:
> On Wed, Oct 22, 2025 at 02:55:31PM +0300, Corey Huinker wrote:
>>> Do you have some numbers regarding the increase in size this generates
>>> for the catalogs?
>>
>> Sorry, I don't understand. There shouldn't be any increase inside the
>> catalogs as the internal storage of the datatypes hasn't changed, so I can
>> only conclude that you're referring to something else.
>
> The new format meant more characters, perhaps I've just missed
> something while quickly testing the patch.. Anyway, that's OK at this
> stage.
>
>> The equivalent structures in attribute_stats.c will need documenting too.
>
> Right. This sounds like a separate patch to me, impacting HEAD.
>
>> Right now we have a situation where the vast majority of databases can
>> carry forward all of their stats via pg_upgrade, except for those databases
>> that have extended stats. The trouble is, most customers don't know if
>> their database uses extended statistics or not, and those that do are in
>> for some bad query plans if they haven't run vacuumdb --missing-stats-only.
>> Explaining that to customers is complicated, especially when most of them
>> do not know what extended stats are, let alone whether they have them. It
>> would be a lot simpler to just say "all stats are carried over on upgrade",
>> and vacuumdb becomes unnecessary, making upgrades one step simpler as well.
>
> Okay.
>
>> Given that, I think that the admittedly ugly transformation is worth it,
>> and sequestering it inside pg_dump is the smallest footprint it can have.
>> Earlier in this thread I posted some functions that did the translation
>> from the existing formats to the proposed new formats. We could include
>> those as new system functions, and that would make the dump code very
>> simple. Having said that, I don't know that there would be use for those
>> functions except inside pg_dump, hence the decision to do the transforms
>> right in the dump query.
>
> I'd prefer the new format. One killer pushing in favor of the new
> format that you are making upthread in favor of is that it makes much
> easier the viewing, editing and injecting of these stats. It's the
> part of the patch where we would need Tomas' input on the matter
> before deciding anything, I guess, as primary author of the original
> facilities. My view of the problem is just one opinion.
>
Sorry for not paying much attention to this thread ...
My opinion is that we should both use the new format and keep the
pg_dump code to allow upgrading from older pre-19 versions.
There really is nothing special about the current format - I should have
used JSON (or any other established format) from the beginning. But I
only saw that as human-readable version of ephemeral data, it didn't
occur to me we'll use this to export/import stats cross versions. So if
we need to adjust that to make new use cases more convenient, let's bite
the bullet now.
If doing both is too complex / ugly, I think the pg_upgrade capability
is more valuable. I'd rather keep the old, less convenient format to
have pg_upgrade support for all versions.
Otherwise users may not benefit from this pg_upgrade feature for a
couple more years. Plenty of users delay upgrading until the EOL gets
close, and so might be unable to dump/restore extended stats for the
next ~5 years.
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jacob Champion | 2025-10-31 20:30:52 | Re: Support getrandom() for pg_strong_random() source |
| Previous Message | Nathan Bossart | 2025-10-31 20:12:47 | Re: another autovacuum scheduling thread |