Re: Extended Statistics set/restore/clear functions.

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)lists(dot)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Extended Statistics set/restore/clear functions.
Date: 2025-10-21 05:48:37
Message-ID: aPcetUfI2NJDqYxZ@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 18, 2025 at 08:27:58PM -0400, Corey Huinker wrote:
> And rebased again to conform to 688dc6299 and 4bd919129.

The format redesign for extended stats is pretty nice, as done in
0001. I think that this patch should be split in two, actually, as it
tackles two issues:
- One patch for the format change.
- Second patch for the introduction of the input function, useful on
its own to allow more regression tests paths for the format generated.

Similar split comment for 0002 regarding pg_dependencies. The format
change is one thing. The input function is cool to have for input
validation, still not absolutely mandatory for the core part of the
format change.

The functions exposed in 0003 should be renamed to match more with the
style of the rest, aka it is a bit hard to figure out what they do at
first sight. Presumably, these should be prefixed with some
"statext_", except text_to_stavalues() which could still be named the
same.

Do you have some numbers regarding the increase in size this generates
for the catalogs?

0004 has been designed following the same model as the relation and
attribute stats. That sounds OK here.

+enum extended_stats_argnum
[...]
+enum extended_stats_exprs_element

It would be nice to document why such things are around. That would
be less guessing for somebody reading the code.

Reusing this small sequence from your pg_dump patch, executed on a v14
backend:
create schema dump_test;
CREATE TABLE dump_test.has_ext_stats
AS SELECT g.g AS x, g.g / 2 AS y FROM generate_series(1,100) AS g(g);
CREATE STATISTICS dump_test.es1 ON x, (y % 2) FROM dump_test.has_ext_stats;
ANALYZE dump_test.has_ext_stats;

Then pg_dump fails:
pg_dump: error: query failed: ERROR: column e.inherited does not exist
LINE 2: ...hemaname = $1 AND e.statistics_name = $2 ORDER BY e.inherite...

+ * TODO: Until v18 is released the master branch has a
+ * server_version_num of 180000. We will update this to 190000
as soon
+ * as the master branch updates.

This part has not been updated.

+ Assert(item.nattributes > 0); /* TODO: elog? */
[...]
+ Assert(dependency->nattributes > 1); /* TODO: elog? */
Yes and yes. It seems like it should be possible to craft some input
that triggers these..

+void
+free_pg_dependencies(MVDependencies *dependencies);

Double declaration of this routine in dependencies.c.

Perhaps some of the regression tests could use some jsonb_pretty() in
the outputs generated. Some of the results generated are very hard to
parse, something that would become harder in the buildfarm. This
comment starts with 0001 for stxdndistinct.

I have mixed feelings about 0005, FWIW. I am wondering if we should
not lift the needle a bit here and only support the dump of extended
statistics when dealing with a backend of at least v19. This would
mean that we would only get the full benefit of this feature once
people upgrade to v20 or dump from a pg_dump with --statistics from at
least v19, but with the long-term picture in mind this would also make
the dump/restore picture of the patch dead simple (spoiler: I like
simple).

Tomas, what is your take about the format changes and my argument
about the backward requirements of pg_dump (about not dumping these
stats if connecting to a server older than v18, included)?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-10-21 05:53:33 Re: BRIN: Prevent the heapblk overflow during index summarization on very large tables resulting in an infinite loop
Previous Message David Rowley 2025-10-21 05:32:01 Re: BRIN: Prevent the heapblk overflow during index summarization on very large tables resulting in an infinite loop