Quick Links

Re: Add expressions to pg_restore_extended_stats()

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc:	Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject:	Re: Add expressions to pg_restore_extended_stats()
Date:	2026-02-03 01:40:13
Message-ID:	aYFR_UqFhO4ZnoSq@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Feb 02, 2026 at 03:21:35AM -0500, Corey Huinker wrote:
> I had actually figured that the pg_dump part would actually turn you OFF of
> this approach.

Sorry, I guess. :)

>> For this warning in the regression tests, you should only need one
>> element, reducing the number of input lines?
>
> That's true IF we decide that missing expression keys are a thing that we
> allow, and per conversation above it's not clear yet that we do.

If ANALYZE can generate some partial data for a set of expression (aka
generating some data for a portion of the expressions defined but skip
some of them because of reason x_y_z), the restore function needs to
cope with this policy (I'd need to double-check the code to be sure,
not sure on top of my mind now and I cannot do that today,
unfortunately).

>> Something that looks missing here is that we do not attach to which
>> negative attribute an item of "exprs" links to. It seems to me that
>> this needs to include an extra key called "attribute" with a negative
>> associated with it, so as we can rely on the attnum instead of the
>> order of the items restored.
>
> Oof. So there is no such attribute number exposed in pg_stat_ext_exprs(),
> we have to rely on the deterministic order that they are fetched from the
> view. The "expr" text definition is essentially a node tree, which we can't
> guarantee stays stable across major versions, so that's no help.
> jsonb_agg() doesn't guarantee order on it's own, but it can be enforced via
> an ORDINALITY-like trick. I had to do this before, can I excavate that code
> again. Thanks for the reminder.

Hmm. I would not rely on the ordering of the items as they are
scanned, that seems like a recipe for disaster. We have a text
representation of the expression, as of pg_stats_ext_exprs.expr. This
could be mapped with the elements of hte text[] in pg_stats_ext.exprs.
Here is a wilder idea: why not just putting the expression text itself
in the data given in input of the restore function rather than a
guessed argument number? For the case of manual stats injections,
that kinds of makes things simpler.

> Since clearly, you're on board with the jsonb idea, I'll make the next
> version all about cleanups and tightening up this method.

Yeah, this approach makes sense to me in terms of clarity and item
handling for the expression parts. Now, as you have mentioned me
offline, there was a discussion about applying this format to other
parts of the arguments, which would be around here I guess:
https://www.postgresql.org/message-id/CADkLM%3DfhVk3BJ8z20iQW2wGfuOk%2BZSgzNHUVGtLpmvzbQ9Ontg%40mail.gmail.com

This area of the discussion focused on the catalog data types we have
for dependencies and ndistinct, as well.

> The outstanding questions from the above are, given an jsonb tree of [ {
> "null_frac": 0.5, ...}, {"avg_width": "1", ...}], which for notation I'll
> call an array of expr1 and expr2
>
> 1. If an expected key is missing from either expr1 or expr2, is that worth
> failing the whole import_expressions()?
> 2. Is an unexpected key in either expr1 or expr2 worth failing the whole
> import_expressions()?
> 3. If the answer to #2 is no, do we need to warn about the existence of the
> weird parameter at all, or just silently skip it?

Yeah, it sounds to me that we should just set ok=false and give up
rather than have a semi-filled set of numbers for a single extended
stats object. There is an argument in favor of that: it can simplify
the detection of missing stats for a single extended stats definition.
I understand that you'd want to keep going with loading the data even
if it's partial. My question is: is it possible for ANALYZE to fill
in only a portion of the expressions and can these be partially
skipped? If the answer to my question is yes, the restore function
should do the same and my idea of the matter is wrong. If the answer
to my question is no, then your idea on this matter is the right one.

> 4. Does casting the numeric scalar values (null_frac, correlation,
> avg_width, n_distinct) to text make sense, since we have to put them
> through type-specific input functions anyway?

In terms of JSON, it makes the use of a representation simpler. I
don't think that we need to apply a strict casting when fetching them.

> 5. Do any of these jsonb-twiddling operations look generically useful
> enough to put in jsonb_utils.c?

This one I cannot say yet. It depends on the patch as written.

@Tomas (now added in CC for confirmation): would you see a problem
against applying a JSONB data type to the argument for the restore of
extended stats? This level of data serialization would be required
when inserting data for what would show up into pg_stats_ext_exprs.
--
Michael

In response to

Re: Add expressions to pg_restore_extended_stats() at 2026-02-02 08:21:35 from Corey Huinker

Responses

Re: Add expressions to pg_restore_extended_stats() at 2026-02-03 02:17:03 from Corey Huinker

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2026-02-03 01:42:45	Re: display hot standby state in psql prompt
Previous Message	John Naylor	2026-02-03 01:07:59	Re: [PATCH] Refactor *_abbrev_convert() functions